Open-Meteo maintains an API for historical weather that allows for non-commercial usage of historical weather data maintained by the website.
This file builds on _v001, _v002, _v003, and _v004 to run exploratory analysis on some historical weather data.
The exploration process uses tidyverse, ranger, several generic custom functions, and several functions specific to Open Meteo processing. First, tidyverse, ranger, and the generic functions are loaded:
library(tidyverse) # tidyverse functionality is included throughout
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ranger) # predict() does not work on ranger objects unless ranger has been called
source("./Generic_Added_Utility_Functions_202105_v001.R") # Basic functions
Next, specific functions written in _v001, _v002, _v003, and _v004 are sourced:
source("./SimpleOneVar_Functions_202411_v001.R") # Functions for basic single variable analysis
source("./OpenMeteo_NextBest_202411_v001.R") # Functions for finding 'next best' predictor given existing model
source("./OpenMeteo_Functions_202411_v001.R") # Core functions for loading, processing, analysis of Open Meteo
source("./Generic_Analysis_Functions_202411_v001.R") # Additional functions for random forest and related analysis
Key mapping tables for available metrics are also copied:
hourlyMetrics <- "temperature_2m,relativehumidity_2m,dewpoint_2m,apparent_temperature,pressure_msl,surface_pressure,precipitation,rain,snowfall,cloudcover,cloudcover_low,cloudcover_mid,cloudcover_high,shortwave_radiation,direct_radiation,direct_normal_irradiance,diffuse_radiation,windspeed_10m,windspeed_100m,winddirection_10m,winddirection_100m,windgusts_10m,et0_fao_evapotranspiration,weathercode,vapor_pressure_deficit,soil_temperature_0_to_7cm,soil_temperature_7_to_28cm,soil_temperature_28_to_100cm,soil_temperature_100_to_255cm,soil_moisture_0_to_7cm,soil_moisture_7_to_28cm,soil_moisture_28_to_100cm,soil_moisture_100_to_255cm"
dailyMetrics <- "weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration"
hourlyDescription <- "Air temperature at 2 meters above ground\nRelative humidity at 2 meters above ground\nDew point temperature at 2 meters above ground\nApparent temperature is the perceived feels-like temperature combining wind chill factor, relative humidity and solar radiation\nAtmospheric air pressure reduced to mean sea level (msl) or pressure at surface. Typically pressure on mean sea level is used in meteorology. Surface pressure gets lower with increasing elevation.\nAtmospheric air pressure reduced to mean sea level (msl) or pressure at surface. Typically pressure on mean sea level is used in meteorology. Surface pressure gets lower with increasing elevation.\nTotal precipitation (rain, showers, snow) sum of the preceding hour. Data is stored with a 0.1 mm precision. If precipitation data is summed up to monthly sums, there might be small inconsistencies with the total precipitation amount.\nOnly liquid precipitation of the preceding hour including local showers and rain from large scale systems.\nSnowfall amount of the preceding hour in centimeters. For the water equivalent in millimeter, divide by 7. E.g. 7 cm snow = 10 mm precipitation water equivalent\nTotal cloud cover as an area fraction\nLow level clouds and fog up to 2 km altitude\nMid level clouds from 2 to 6 km altitude\nHigh level clouds from 6 km altitude\nShortwave solar radiation as average of the preceding hour. This is equal to the total global horizontal irradiation\nDirect solar radiation as average of the preceding hour on the horizontal plane and the normal plane (perpendicular to the sun)\nDirect solar radiation as average of the preceding hour on the horizontal plane and the normal plane (perpendicular to the sun)\nDiffuse solar radiation as average of the preceding hour\nWind speed at 10 or 100 meters above ground. Wind speed on 10 meters is the standard level.\nWind speed at 10 or 100 meters above ground. Wind speed on 10 meters is the standard level.\nWind direction at 10 or 100 meters above ground\nWind direction at 10 or 100 meters above ground\nGusts at 10 meters above ground of the indicated hour. Wind gusts in CERRA are defined as the maximum wind gusts of the preceding hour. Please consult the ECMWF IFS documentation for more information on how wind gusts are parameterized in weather models.\nET0 Reference Evapotranspiration of a well watered grass field. Based on FAO-56 Penman-Monteith equations ET0 is calculated from temperature, wind speed, humidity and solar radiation. Unlimited soil water is assumed. ET0 is commonly used to estimate the required irrigation for plants.\nWeather condition as a numeric code. Follow WMO weather interpretation codes. See table below for details. Weather code is calculated from cloud cover analysis, precipitation and snowfall. As barely no information about atmospheric stability is available, estimation about thunderstorms is not possible.\nVapor Pressure Deificit (VPD) in kilopascal (kPa). For high VPD (>1.6), water transpiration of plants increases. For low VPD (<0.4), transpiration decreases\nAverage temperature of different soil levels below ground.\nAverage temperature of different soil levels below ground.\nAverage temperature of different soil levels below ground.\nAverage temperature of different soil levels below ground.\nAverage soil water content as volumetric mixing ratio at 0-7, 7-28, 28-100 and 100-255 cm depths.\nAverage soil water content as volumetric mixing ratio at 0-7, 7-28, 28-100 and 100-255 cm depths.\nAverage soil water content as volumetric mixing ratio at 0-7, 7-28, 28-100 and 100-255 cm depths.\nAverage soil water content as volumetric mixing ratio at 0-7, 7-28, 28-100 and 100-255 cm depths."
dailyDescription <- "The most severe weather condition on a given day\nMaximum and minimum daily air temperature at 2 meters above ground\nMaximum and minimum daily air temperature at 2 meters above ground\nMaximum and minimum daily apparent temperature\nMaximum and minimum daily apparent temperature\nSum of daily precipitation (including rain, showers and snowfall)\nSum of daily rain\nSum of daily snowfall\nThe number of hours with rain\nSun rise and set times\nSun rise and set times\nMaximum wind speed and gusts on a day\nMaximum wind speed and gusts on a day\nDominant wind direction\nThe sum of solar radiaion on a given day in Megajoules\nDaily sum of ET0 Reference Evapotranspiration of a well watered grass field"
# Create tibble for hourly metrics
tblMetricsHourly <- tibble::tibble(metric=hourlyMetrics %>% str_split_1(","),
description=hourlyDescription %>% str_split_1("\n")
)
tblMetricsHourly %>%
print(n=50)
## # A tibble: 33 × 2
## metric description
## <chr> <chr>
## 1 temperature_2m Air temperature at 2 meters above ground
## 2 relativehumidity_2m Relative humidity at 2 meters above ground
## 3 dewpoint_2m Dew point temperature at 2 meters above ground
## 4 apparent_temperature Apparent temperature is the perceived feels-li…
## 5 pressure_msl Atmospheric air pressure reduced to mean sea l…
## 6 surface_pressure Atmospheric air pressure reduced to mean sea l…
## 7 precipitation Total precipitation (rain, showers, snow) sum …
## 8 rain Only liquid precipitation of the preceding hou…
## 9 snowfall Snowfall amount of the preceding hour in centi…
## 10 cloudcover Total cloud cover as an area fraction
## 11 cloudcover_low Low level clouds and fog up to 2 km altitude
## 12 cloudcover_mid Mid level clouds from 2 to 6 km altitude
## 13 cloudcover_high High level clouds from 6 km altitude
## 14 shortwave_radiation Shortwave solar radiation as average of the pr…
## 15 direct_radiation Direct solar radiation as average of the prece…
## 16 direct_normal_irradiance Direct solar radiation as average of the prece…
## 17 diffuse_radiation Diffuse solar radiation as average of the prec…
## 18 windspeed_10m Wind speed at 10 or 100 meters above ground. W…
## 19 windspeed_100m Wind speed at 10 or 100 meters above ground. W…
## 20 winddirection_10m Wind direction at 10 or 100 meters above ground
## 21 winddirection_100m Wind direction at 10 or 100 meters above ground
## 22 windgusts_10m Gusts at 10 meters above ground of the indicat…
## 23 et0_fao_evapotranspiration ET0 Reference Evapotranspiration of a well wat…
## 24 weathercode Weather condition as a numeric code. Follow WM…
## 25 vapor_pressure_deficit Vapor Pressure Deificit (VPD) in kilopascal (k…
## 26 soil_temperature_0_to_7cm Average temperature of different soil levels b…
## 27 soil_temperature_7_to_28cm Average temperature of different soil levels b…
## 28 soil_temperature_28_to_100cm Average temperature of different soil levels b…
## 29 soil_temperature_100_to_255cm Average temperature of different soil levels b…
## 30 soil_moisture_0_to_7cm Average soil water content as volumetric mixin…
## 31 soil_moisture_7_to_28cm Average soil water content as volumetric mixin…
## 32 soil_moisture_28_to_100cm Average soil water content as volumetric mixin…
## 33 soil_moisture_100_to_255cm Average soil water content as volumetric mixin…
# Create tibble for daily metrics
tblMetricsDaily <- tibble::tibble(metric=dailyMetrics %>% str_split_1(","),
description=dailyDescription %>% str_split_1("\n")
)
tblMetricsDaily
## # A tibble: 16 × 2
## metric description
## <chr> <chr>
## 1 weathercode The most severe weather condition on a given day
## 2 temperature_2m_max Maximum and minimum daily air temperature at 2 me…
## 3 temperature_2m_min Maximum and minimum daily air temperature at 2 me…
## 4 apparent_temperature_max Maximum and minimum daily apparent temperature
## 5 apparent_temperature_min Maximum and minimum daily apparent temperature
## 6 precipitation_sum Sum of daily precipitation (including rain, showe…
## 7 rain_sum Sum of daily rain
## 8 snowfall_sum Sum of daily snowfall
## 9 precipitation_hours The number of hours with rain
## 10 sunrise Sun rise and set times
## 11 sunset Sun rise and set times
## 12 windspeed_10m_max Maximum wind speed and gusts on a day
## 13 windgusts_10m_max Maximum wind speed and gusts on a day
## 14 winddirection_10m_dominant Dominant wind direction
## 15 shortwave_radiation_sum The sum of solar radiaion on a given day in Megaj…
## 16 et0_fao_evapotranspiration Daily sum of ET0 Reference Evapotranspiration of …
A previously existing dataset is loaded, with key analysis variables defined in a vector:
# Load previous data
allCity <- readFromRDS("allCity_20241116")
# Get core training variables
varsTrain <- getVarsTrain(allCity)
varsTrain
## [1] "hour" "temperature_2m"
## [3] "relativehumidity_2m" "dewpoint_2m"
## [5] "apparent_temperature" "pressure_msl"
## [7] "surface_pressure" "precipitation"
## [9] "rain" "snowfall"
## [11] "cloudcover" "cloudcover_low"
## [13] "cloudcover_mid" "cloudcover_high"
## [15] "shortwave_radiation" "direct_radiation"
## [17] "direct_normal_irradiance" "diffuse_radiation"
## [19] "windspeed_10m" "windspeed_100m"
## [21] "winddirection_10m" "winddirection_100m"
## [23] "windgusts_10m" "et0_fao_evapotranspiration"
## [25] "weathercode" "vapor_pressure_deficit"
## [27] "soil_temperature_0_to_7cm" "soil_temperature_7_to_28cm"
## [29] "soil_temperature_28_to_100cm" "soil_temperature_100_to_255cm"
## [31] "soil_moisture_0_to_7cm" "soil_moisture_7_to_28cm"
## [33] "soil_moisture_28_to_100cm" "soil_moisture_100_to_255cm"
## [35] "year" "doy"
# Assign default label
keyLabel <- genericKeyLabelOM()
keyLabel
## [1] "predictions based on pre-2022 training data applied to 2022 holdout dataset"
The correlation heatmap is reproduced, with functions that borrowing from the recipe at STHDA:
# Default function
corVarsTrain <- makeHeatMap(allCity, vecSelect=varsTrain, returnData=TRUE)
corVarsTrain %>% filter(Var1!=Var2) %>% arrange(desc(value)) %>% print(n=20)
## # A tibble: 630 × 3
## Var1 Var2 value
## <fct> <fct> <dbl>
## 1 rain precipitation 0.989
## 2 apparent_temperature temperature_2m 0.984
## 3 direct_radiation shortwave_radiation 0.974
## 4 temperature_2m soil_temperature_0_to_7cm 0.962
## 5 soil_temperature_28_to_100cm soil_temperature_7_to_28cm 0.952
## 6 shortwave_radiation et0_fao_evapotranspiration 0.944
## 7 apparent_temperature soil_temperature_0_to_7cm 0.942
## 8 soil_moisture_7_to_28cm soil_moisture_0_to_7cm 0.940
## 9 windspeed_100m windspeed_10m 0.940
## 10 direct_radiation et0_fao_evapotranspiration 0.928
## 11 direct_radiation direct_normal_irradiance 0.922
## 12 soil_moisture_28_to_100cm soil_moisture_7_to_28cm 0.922
## 13 soil_temperature_0_to_7cm soil_temperature_7_to_28cm 0.919
## 14 apparent_temperature soil_temperature_7_to_28cm 0.918
## 15 temperature_2m soil_temperature_7_to_28cm 0.917
## 16 shortwave_radiation direct_normal_irradiance 0.901
## 17 windspeed_10m windgusts_10m 0.884
## 18 soil_temperature_100_to_255cm soil_temperature_28_to_100cm 0.863
## 19 soil_moisture_100_to_255cm soil_moisture_28_to_100cm 0.859
## 20 apparent_temperature soil_temperature_28_to_100cm 0.853
## # ℹ 610 more rows
The correlation heatmap is produced for a single city:
corVarsTrainBOS <- makeHeatMap(allCity %>% filter(src=="Boston"), vecSelect=varsTrain, returnData=TRUE)
corVarsTrainBOS %>% filter(Var1!=Var2) %>% arrange(desc(value)) %>% print(n=20)
## # A tibble: 630 × 3
## Var1 Var2 value
## <fct> <fct> <dbl>
## 1 surface_pressure pressure_msl 1.00
## 2 apparent_temperature temperature_2m 0.993
## 3 temperature_2m soil_temperature_0_to_7cm 0.965
## 4 apparent_temperature soil_temperature_0_to_7cm 0.965
## 5 soil_temperature_0_to_7cm soil_temperature_7_to_28cm 0.965
## 6 direct_radiation shortwave_radiation 0.962
## 7 rain precipitation 0.959
## 8 soil_temperature_28_to_100cm soil_temperature_7_to_28cm 0.950
## 9 windspeed_100m windspeed_10m 0.946
## 10 apparent_temperature dewpoint_2m 0.937
## 11 shortwave_radiation et0_fao_evapotranspiration 0.935
## 12 windspeed_10m windgusts_10m 0.927
## 13 apparent_temperature soil_temperature_7_to_28cm 0.925
## 14 temperature_2m dewpoint_2m 0.917
## 15 direct_radiation direct_normal_irradiance 0.916
## 16 soil_moisture_7_to_28cm soil_moisture_0_to_7cm 0.915
## 17 temperature_2m soil_temperature_7_to_28cm 0.914
## 18 direct_radiation et0_fao_evapotranspiration 0.902
## 19 winddirection_100m winddirection_10m 0.895
## 20 soil_temperature_0_to_7cm soil_temperature_28_to_100cm 0.886
## # ℹ 610 more rows
Some variables, such as surface pressure vs. MSL pressure, are much differently correlated if controlling for city. Differences are explored:
tstCorDelta <- corVarsTrain %>%
mutate(v1=pmin(as.character(Var1), as.character(Var2)),
v2=pmax(as.character(Var1), as.character(Var2))
) %>%
select(v1, v2, value_all=value) %>%
full_join(corVarsTrainBOS %>%
mutate(v1=pmin(as.character(Var1), as.character(Var2)),
v2=pmax(as.character(Var1), as.character(Var2))
) %>%
select(v1, v2, value_bos=value),
by=c("v1", "v2")
) %>%
mutate(delta=value_bos-value_all)
tstCorDelta
## # A tibble: 666 × 5
## v1 v2 value_all value_bos delta
## <chr> <chr> <dbl> <dbl> <dbl>
## 1 dewpoint_2m dewpoint_2m 1 1 0
## 2 dewpoint_2m soil_temperature_7_to_28cm 0.675 0.874 0.199
## 3 dewpoint_2m soil_temperature_28_to_100cm 0.630 0.807 0.177
## 4 dewpoint_2m soil_temperature_0_to_7cm 0.623 0.875 0.252
## 5 dewpoint_2m temperature_2m 0.691 0.917 0.226
## 6 apparent_temperature dewpoint_2m 0.785 0.937 0.152
## 7 dewpoint_2m soil_temperature_100_to_255cm 0.448 0.505 0.0567
## 8 dewpoint_2m doy 0.185 0.312 0.127
## 9 dewpoint_2m hour 0.00764 0.0203 0.0127
## 10 dewpoint_2m vapor_pressure_deficit -0.0678 0.305 0.373
## # ℹ 656 more rows
tstCorDelta %>% arrange(delta) %>% head(20)
## # A tibble: 20 × 5
## v1 v2 value_all value_bos delta
## <chr> <chr> <dbl> <dbl> <dbl>
## 1 relativehumidity_2m surface_pressure 0.578 -0.160 -0.738
## 2 soil_moisture_0_to_7cm surface_pressure 0.594 -0.0914 -0.685
## 3 soil_moisture_7_to_28cm surface_pressure 0.586 -0.0240 -0.610
## 4 dewpoint_2m surface_pressure 0.364 -0.237 -0.601
## 5 soil_moisture_100_to_255cm surface_pressure 0.602 0.00609 -0.596
## 6 soil_moisture_28_to_100cm surface_pressure 0.588 0.0215 -0.566
## 7 surface_pressure windspeed_100m 0.247 -0.311 -0.558
## 8 dewpoint_2m soil_moisture_28_to_10… 0.0416 -0.508 -0.549
## 9 surface_pressure windspeed_10m 0.195 -0.304 -0.498
## 10 dewpoint_2m soil_moisture_7_to_28cm 0.0435 -0.448 -0.491
## 11 surface_pressure windgusts_10m 0.133 -0.332 -0.465
## 12 dewpoint_2m soil_moisture_0_to_7cm 0.0722 -0.393 -0.465
## 13 relativehumidity_2m soil_moisture_28_to_10… 0.401 -0.0532 -0.454
## 14 relativehumidity_2m soil_moisture_100_to_2… 0.374 -0.0545 -0.429
## 15 cloudcover surface_pressure 0.279 -0.144 -0.423
## 16 relativehumidity_2m soil_moisture_7_to_28cm 0.441 0.0357 -0.406
## 17 soil_moisture_0_to_7cm soil_moisture_100_to_2… 0.796 0.402 -0.393
## 18 cloudcover_low surface_pressure 0.197 -0.186 -0.383
## 19 surface_pressure weathercode 0.140 -0.240 -0.380
## 20 cloudcover_mid surface_pressure 0.142 -0.221 -0.362
tstCorDelta %>% arrange(desc(delta)) %>% head(20)
## # A tibble: 20 × 5
## v1 v2 value_all value_bos delta
## <chr> <chr> <dbl> <dbl> <dbl>
## 1 soil_moisture_100_to_255cm year 0.0387 0.739 0.701
## 2 pressure_msl surface_pressure 0.401 1.00 0.599
## 3 soil_moisture_28_to_100cm year -0.0677 0.386 0.454
## 4 soil_moisture_7_to_28cm year -0.0426 0.406 0.449
## 5 surface_pressure vapor_pressure_deficit -0.493 -0.0499 0.443
## 6 dewpoint_2m vapor_pressure_deficit -0.0678 0.305 0.373
## 7 soil_moisture_100_to_255cm vapor_pressure_deficit -0.337 -0.00259 0.334
## 8 soil_moisture_0_to_7cm year 0.0110 0.334 0.323
## 9 doy soil_temperature_100_to… 0.453 0.764 0.311
## 10 relativehumidity_2m soil_temperature_0_to_7… -0.253 0.00738 0.260
## 11 dewpoint_2m soil_temperature_0_to_7… 0.623 0.875 0.252
## 12 relativehumidity_2m soil_temperature_28_to_… -0.133 0.116 0.249
## 13 relativehumidity_2m temperature_2m -0.213 0.0356 0.249
## 14 relativehumidity_2m soil_temperature_7_to_2… -0.139 0.109 0.247
## 15 relativehumidity_2m soil_temperature_100_to… -0.120 0.123 0.243
## 16 pressure_msl vapor_pressure_deficit -0.292 -0.0521 0.240
## 17 dewpoint_2m temperature_2m 0.691 0.917 0.226
## 18 cloudcover temperature_2m -0.162 0.0534 0.215
## 19 direct_normal_irradiance surface_pressure -0.118 0.0838 0.201
## 20 dewpoint_2m soil_temperature_7_to_2… 0.675 0.874 0.199
The process is repeated, using an aggregation of correlations in each of the seven cities:
keyCities <- allCity %>% pull(src) %>% unique()
keyCities
## [1] "NYC" "LA" "Chicago" "Houston" "Vegas" "Miami" "Boston"
corVarsTrainEach <- keyCities %>%
map_dfr(.f=function(x) makeHeatMap(allCity %>% filter(src==x),
vecSelect=varsTrain[!varsTrain %in% ifelse(x=="Miami", "snowfall", "")],
returnData=TRUE,
plotMap=FALSE
) %>%
mutate(v1=pmin(as.character(Var1), as.character(Var2)),
v2=pmax(as.character(Var1), as.character(Var2))
) %>%
select(v1, v2, value), .id="src") %>%
mutate(city=keyCities[as.integer(src)]) %>%
select(src, city, everything())
corVarsTrainEach
## # A tibble: 4,626 × 5
## src city v1 v2 value
## <chr> <chr> <chr> <chr> <dbl>
## 1 1 NYC soil_temperature_7_to_28cm soil_temperature_7_to_28cm 1
## 2 1 NYC soil_temperature_28_to_100cm soil_temperature_7_to_28cm 0.952
## 3 1 NYC dewpoint_2m soil_temperature_7_to_28cm 0.883
## 4 1 NYC soil_temperature_0_to_7cm soil_temperature_7_to_28cm 0.954
## 5 1 NYC soil_temperature_7_to_28cm temperature_2m 0.932
## 6 1 NYC apparent_temperature soil_temperature_7_to_28cm 0.937
## 7 1 NYC soil_temperature_100_to_255cm soil_temperature_7_to_28cm 0.584
## 8 1 NYC doy soil_temperature_7_to_28cm 0.360
## 9 1 NYC cloudcover_mid soil_temperature_7_to_28cm -0.100
## 10 1 NYC cloudcover_high soil_temperature_7_to_28cm 0.0134
## # ℹ 4,616 more rows
corVarsTrainEach %>%
arrange(desc(value)) %>%
filter(v1!=v2) %>%
slice(1:15, (nrow(.)-14):nrow(.)) %>%
print(n=30)
## # A tibble: 30 × 5
## src city v1 v2 value
## <chr> <chr> <chr> <chr> <dbl>
## 1 6 Miami precipitation rain 1
## 2 6 Miami pressure_msl surface_pressure 1.00
## 3 7 Boston pressure_msl surface_pressure 1.00
## 4 2 LA precipitation rain 1.00
## 5 4 Houston pressure_msl surface_pressure 1.00
## 6 4 Houston precipitation rain 1.00
## 7 1 NYC pressure_msl surface_pressure 1.00
## 8 5 Vegas precipitation rain 0.999
## 9 3 Chicago pressure_msl surface_pressure 0.994
## 10 1 NYC apparent_temperature temperature_2m 0.993
## 11 3 Chicago apparent_temperature temperature_2m 0.993
## 12 7 Boston apparent_temperature temperature_2m 0.993
## 13 5 Vegas direct_radiation shortwave_radiation 0.989
## 14 5 Vegas apparent_temperature temperature_2m 0.988
## 15 3 Chicago precipitation rain 0.988
## 16 1 NYC soil_moisture_0_to_7cm soil_temperature_7_to_28cm -0.680
## 17 1 NYC soil_moisture_7_to_28cm soil_temperature_0_to_7cm -0.681
## 18 1 NYC soil_moisture_100_to_255cm soil_temperature_100_to_255cm -0.687
## 19 1 NYC soil_moisture_28_to_100cm soil_temperature_100_to_255cm -0.688
## 20 4 Houston et0_fao_evapotranspiration relativehumidity_2m -0.689
## 21 1 NYC soil_moisture_28_to_100cm soil_temperature_7_to_28cm -0.707
## 22 1 NYC soil_moisture_7_to_28cm soil_temperature_28_to_100cm -0.715
## 23 1 NYC soil_moisture_7_to_28cm soil_temperature_7_to_28cm -0.720
## 24 3 Chicago soil_moisture_100_to_255cm year -0.758
## 25 1 NYC soil_moisture_28_to_100cm soil_temperature_28_to_100cm -0.777
## 26 4 Houston relativehumidity_2m vapor_pressure_deficit -0.791
## 27 2 LA relativehumidity_2m vapor_pressure_deficit -0.818
## 28 5 Vegas soil_moisture_28_to_100cm year -0.828
## 29 5 Vegas soil_moisture_7_to_28cm year -0.831
## 30 6 Miami relativehumidity_2m vapor_pressure_deficit -0.868
Comparison of aggregated individual city correlations to the overall correlations:
corVarsTrainDelta <- corVarsTrain %>%
mutate(v1=pmin(as.character(Var1), as.character(Var2)),
v2=pmax(as.character(Var1), as.character(Var2))
) %>%
select(v1, v2, value_all=value) %>%
full_join(corVarsTrainEach %>%
group_by(v1, v2) %>%
summarize(value=mean(value), .groups="drop"),
by=c("v1", "v2")) %>%
mutate(delta=value-value_all)
corVarsTrainDelta %>%
arrange(delta) %>%
slice(1:15, (nrow(.)-14):nrow(.)) %>%
print(n=30)
## # A tibble: 30 × 5
## v1 v2 value_all value delta
## <chr> <chr> <dbl> <dbl> <dbl>
## 1 relativehumidity_2m surface_pressure 0.578 -0.178 -0.756
## 2 dewpoint_2m surface_pressure 0.364 -0.371 -0.735
## 3 soil_moisture_100_to_255cm soil_moisture_7_to_28cm 0.824 0.156 -0.668
## 4 soil_moisture_0_to_7cm soil_moisture_100_to_25… 0.796 0.138 -0.658
## 5 soil_moisture_0_to_7cm surface_pressure 0.594 -0.0212 -0.615
## 6 soil_moisture_100_to_255cm surface_pressure 0.602 -0.00359 -0.606
## 7 soil_moisture_100_to_255cm soil_moisture_28_to_100… 0.859 0.283 -0.576
## 8 soil_moisture_7_to_28cm surface_pressure 0.586 0.0143 -0.572
## 9 soil_moisture_28_to_100cm surface_pressure 0.588 0.0277 -0.560
## 10 cloudcover surface_pressure 0.279 -0.138 -0.417
## 11 surface_pressure windspeed_100m 0.247 -0.165 -0.412
## 12 relativehumidity_2m soil_moisture_100_to_25… 0.374 -0.00330 -0.378
## 13 soil_moisture_0_to_7cm soil_moisture_28_to_100… 0.842 0.491 -0.351
## 14 surface_pressure windspeed_10m 0.195 -0.149 -0.344
## 15 relativehumidity_2m soil_moisture_28_to_100… 0.401 0.0674 -0.333
## 16 vapor_pressure_deficit windspeed_10m -0.0519 0.100 0.152
## 17 relativehumidity_2m soil_temperature_100_to… -0.120 0.0325 0.153
## 18 et0_fao_evapotranspiration soil_moisture_100_to_25… -0.138 0.0221 0.160
## 19 diffuse_radiation vapor_pressure_deficit 0.283 0.449 0.166
## 20 soil_moisture_28_to_100cm vapor_pressure_deficit -0.404 -0.234 0.170
## 21 soil_moisture_100_to_255cm temperature_2m -0.204 -0.0213 0.183
## 22 soil_moisture_100_to_255cm soil_temperature_0_to_7… -0.221 -0.0296 0.192
## 23 soil_moisture_100_to_255cm soil_temperature_28_to_… -0.308 -0.115 0.193
## 24 dewpoint_2m vapor_pressure_deficit -0.0678 0.136 0.203
## 25 soil_moisture_100_to_255cm soil_temperature_7_to_2… -0.257 -0.0527 0.205
## 26 direct_normal_irradiance surface_pressure -0.118 0.0875 0.205
## 27 doy soil_temperature_100_to… 0.453 0.739 0.286
## 28 soil_moisture_100_to_255cm vapor_pressure_deficit -0.337 -0.0176 0.319
## 29 surface_pressure vapor_pressure_deficit -0.493 -0.0665 0.427
## 30 pressure_msl surface_pressure 0.401 0.983 0.582
Differences by variable are explored:
corVarsTrain %>%
mutate(v1=pmin(as.character(Var1), as.character(Var2)),
v2=pmax(as.character(Var1), as.character(Var2))
) %>%
select(v1, v2, value_all=value) %>%
full_join(corVarsTrainEach, by=c("v1", "v2")) %>%
mutate(delta=value-value_all) %>%
filter(v1!=v2) %>%
select(city, v1, v2, delta) %>%
bind_rows(., ., .id="src") %>%
mutate(vrbl=case_when(src=="1"~v1, src=="2"~v2, TRUE~NA)) %>%
ggplot(aes(x=fct_reorder(vrbl, delta, .fun=function(x) diff(range(x))), y=delta)) +
geom_boxplot(fill="lightblue") +
coord_flip() +
geom_hline(yintercept=0, color="black", lty=2) +
labs(title="Correlation delta (individual minus all-city aggregate)",
x=NULL,
y="Delta (individual minus all-city aggregate)",
subtitle="Calculated for each variable combination and city"
)
Soil moisture, surface pressure, and dewpoint are among the variables with the highest changes in correlation when calculated n individual cities and the all-city aggregate
An example Simpson’s paradox is MSL pressure vs. surface pressure:
allCity %>%
count(src, surface_pressure, pressure_msl) %>%
ggplot(aes(x=surface_pressure, pressure_msl)) +
geom_smooth(aes(weight=n, color=src), method="lm") +
geom_smooth(method="lm", lty=2, aes(weight=n), color="black") +
labs(title="Relationship between MSL pressure and surface pressure",
subtitle="Dashed black line is overall relationship"
) +
scale_color_discrete(NULL)
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
The process is converted to functional form:
tmpSmoothPlot <- function(df, x, y, xName=x, yName=y, printPlot=TRUE, returnPlot=!isTRUE(printPlot)) {
# FUNCTION ARGUMENTS:
# df: the data frame
# x: x variable
# y: y variable
# xName: name to describe x variable
# yName: name to describe y variable
# printPlot: boolean, should plot be printed?
# returnPlot: boolean, should plot object be returned?
p1 <- df %>%
select(src, all_of(c(x, y))) %>%
purrr::set_names(c("src", "x1", "y1")) %>%
count(src, x1, y1) %>%
ggplot(aes(x=x1, y=y1)) +
geom_smooth(aes(weight=n, color=src), method="lm") +
geom_smooth(method="lm", lty=2, aes(weight=n), color="black") +
labs(title=paste0("Relationship between ", xName, " and ", yName),
subtitle="Dashed black line is overall relationship",
y=if(y!=yName) paste0(yName, "\n(", y, ")") else y,
x=if(x!=xName) paste0(xName, "\n(", x, ")") else x
) +
scale_color_discrete(NULL)
# Print plot if requested
if(isTRUE(printPlot)) print(p1)
# Return plot if requested
if(isTRUE(returnPlot)) return(p1)
}
# Example function call
tmpSmoothPlot(allCity,
x="surface_pressure",
y="pressure_msl",
yName="MSL Pressure",
xName="Surface Pressure"
)
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
The function is tested on two additional sets of metrics:
gridExtra::grid.arrange(tmpSmoothPlot(allCity %>%
mutate(across(c("apparent_temperature", "dewpoint_2m"),
.fns=function(x) round(x)
)
),
x="dewpoint_2m",
y="apparent_temperature",
xName="Dew Point",
yName="Apparent Temperature",
printPlot=FALSE
),
tmpSmoothPlot(allCity %>%
mutate(across(c("surface_pressure", "dewpoint_2m"),
.fns=function(x) round(x)
)
),
x="dewpoint_2m",
y="surface_pressure",
xName="Dew Point",
yName="Surface Pressure",
printPlot=FALSE
),
nrow=1
)
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
The function is updated to allow for rounding:
tmpSmoothPlot <- function(df,
x,
y,
xRound=NULL,
yRound=NULL,
xName=x,
yName=y,
printPlot=TRUE,
returnPlot=!isTRUE(printPlot)
) {
# FUNCTION ARGUMENTS:
# df: the data frame
# x: x variable
# y: y variable
# {x,y}Round: rounding to apply to vector {x,y} using function autoRound()
# NULL means no rounding (default)
# -1L means make an estimate based on data (around 100 buckets created)
# a positive float or integer means round everything to the nearest multiple
# xName: name to describe x variable
# yName: name to describe y variable
# printPlot: boolean, should plot be printed?
# returnPlot: boolean, should plot object be returned?
p1 <- df %>%
select(src, all_of(c(x, y))) %>%
purrr::set_names(c("src", "x1", "y1")) %>%
mutate(x1=autoRound(x1, rndTo=xRound),
y1=autoRound(y1, rndTo=yRound)
) %>%
count(src, x1, y1) %>%
ggplot(aes(x=x1, y=y1)) +
geom_smooth(aes(weight=n, color=src), method="lm") +
geom_smooth(method="lm", lty=2, aes(weight=n), color="black") +
labs(title=paste0("Relationship between ", xName, " and ", yName),
subtitle="Dashed black line is overall relationship",
y=if(y!=yName) paste0(yName, "\n(", y, ")") else y,
x=if(x!=xName) paste0(xName, "\n(", x, ")") else x
) +
scale_color_discrete(NULL)
# Print plot if requested
if(isTRUE(printPlot)) print(p1)
# Return plot if requested
if(isTRUE(returnPlot)) return(p1)
}
The updated function is tested on elements with many buckets:
# Example function call (no rounding)
t0 <- Sys.time()
system.time(
tmpSmoothPlot(allCity,
x="direct_normal_irradiance",
y="shortwave_radiation",
xName="Direct Solar Radiation",
yName="Shortwave Solar Radiation"
)
)
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## user system elapsed
## 4.11 0.57 4.69
Sys.time() - t0
## Time difference of 4.844228 secs
# Example function call (rounding)
t0 <- Sys.time()
system.time(
tmpSmoothPlot(allCity,
x="direct_normal_irradiance",
y="shortwave_radiation",
xRound=-1L,
yRound=-1L,
xName="Direct Solar Radiation",
yName="Shortwave Solar Radiation"
)
)
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## user system elapsed
## 1.04 0.22 1.36
Sys.time() - t0
## Time difference of 1.605754 secs
A simple model is run to predict surface pressure as a function of dewpoint, without considering location:
# Create model
tstLM <- allCity %>%
select(dewpoint_2m, surface_pressure) %>%
lm(surface_pressure ~ dewpoint_2m, data=.)
# Summary
summary(tstLM)
##
## Call:
## lm(formula = surface_pressure ~ dewpoint_2m, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -82.042 -15.314 5.729 17.119 83.092
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.871e+02 3.631e-02 27182.2 <2e-16 ***
## dewpoint_2m 9.507e-01 2.632e-03 361.2 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 26.72 on 854206 degrees of freedom
## Multiple R-squared: 0.1325, Adjusted R-squared: 0.1325
## F-statistic: 1.305e+05 on 1 and 854206 DF, p-value: < 2.2e-16
# Review of predictions
allCity %>%
select(src, dewpoint_2m, surface_pressure) %>%
mutate(pred=predict(tstLM, newdata=.)) %>%
mutate(across(c(surface_pressure, pred), .fns=function(x) autoRound(x))) %>%
count(surface_pressure, pred) %>%
ggplot(aes(x=pred, y=surface_pressure)) +
geom_point(aes(size=n), alpha=0.25) +
geom_smooth(aes(weight=n), method="lm") +
geom_abline(slope=1, intercept=0, lty=2, color="red") +
labs(x="Prediction",
y="Actual",
title="Predictions for Surface Pressure",
subtitle="Surface Pressure ~ Dewpoint"
) +
scale_size_continuous(NULL)
## `geom_smooth()` using formula = 'y ~ x'
The model is updated to predict surface pressure as a function of dewpoint, considering location:
# Create model
tstLM_002 <- allCity %>%
select(dewpoint_2m, surface_pressure, src) %>%
lm(surface_pressure ~ dewpoint_2m:src + src, data=.)
# Summary
summary(tstLM_002)
##
## Call:
## lm(formula = surface_pressure ~ dewpoint_2m:src + src, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -48.856 -2.901 0.233 3.277 28.618
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.017e+03 1.844e-02 55129.10 <2e-16 ***
## srcChicago -2.021e+01 2.594e-02 -779.18 <2e-16 ***
## srcHouston 4.918e+00 4.032e-02 121.98 <2e-16 ***
## srcLA -3.904e+01 2.910e-02 -1341.57 <2e-16 ***
## srcMiami 5.985e+00 7.381e-02 81.09 <2e-16 ***
## srcNYC -2.917e+00 2.700e-02 -108.04 <2e-16 ***
## srcVegas -8.141e+01 2.487e-02 -3273.38 <2e-16 ***
## dewpoint_2m:srcBoston -1.907e-01 1.561e-03 -122.19 <2e-16 ***
## dewpoint_2m:srcChicago -2.323e-01 1.493e-03 -155.62 <2e-16 ***
## dewpoint_2m:srcHouston -4.411e-01 2.017e-03 -218.66 <2e-16 ***
## dewpoint_2m:srcLA -2.287e-01 2.261e-03 -101.17 <2e-16 ***
## dewpoint_2m:srcMiami -3.044e-01 3.512e-03 -86.66 <2e-16 ***
## dewpoint_2m:srcNYC -1.990e-01 1.607e-03 -123.81 <2e-16 ***
## dewpoint_2m:srcVegas -1.471e-01 2.120e-03 -69.39 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.725 on 854194 degrees of freedom
## Multiple R-squared: 0.9602, Adjusted R-squared: 0.9602
## F-statistic: 1.585e+06 on 13 and 854194 DF, p-value: < 2.2e-16
# Review of predictions
allCity %>%
select(src, dewpoint_2m, surface_pressure) %>%
mutate(pred=predict(tstLM_002, newdata=.)) %>%
mutate(across(c(surface_pressure, pred), .fns=function(x) autoRound(x))) %>%
count(src, surface_pressure, pred) %>%
ggplot(aes(x=pred, y=surface_pressure)) +
geom_point(aes(size=n, color=src), alpha=0.25) +
geom_smooth(aes(weight=n), method="lm") +
geom_abline(slope=1, intercept=0, lty=2, color="red") +
labs(x="Prediction",
y="Actual",
title="Predictions for Surface Pressure",
subtitle="Surface Pressure ~ Dewpoint:City + City"
) +
scale_size_continuous(NULL) +
scale_color_discrete(NULL)
## `geom_smooth()` using formula = 'y ~ x'
Long term daily data is downloaded for Atlanta:
# Daily data download for Atlanta, GA
testURLDaily <- helperOpenMeteoURL(cityName="Atlanta GA",
dailyIndices=1:nrow(tblMetricsDaily),
startDate="1960-01-01",
endDate="2023-12-31",
tz="US/Eastern"
)
##
## Daily metrics created from indices: weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration
testURLDaily
## [1] "https://archive-api.open-meteo.com/v1/archive?latitude=33.76&longitude=-84.42&start_date=1960-01-01&end_date=2023-12-31&daily=weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration&timezone=US%2FEastern"
# Download file
if(!file.exists("testOM_daily_atl.json")) {
fileDownload(fileName="testOM_daily_atl.json", url=testURLDaily)
} else {
cat("\nFile testOM_daily_atl.json already exists, skipping download\n")
}
##
## File testOM_daily_atl.json already exists, skipping download
The daily dataset is loaded:
# Read daily JSON file
atlOMDaily <- formatOpenMeteoJSON("testOM_daily_atl.json")
##
## Objects in JSON include: latitude, longitude, generationtime_ms, utc_offset_seconds, timezone, timezone_abbreviation, elevation, daily_units, daily
##
## $tblDaily
## # A tibble: 23,376 × 18
## date time weathercode temperature_2m_max temperature_2m_min
## <date> <chr> <int> <dbl> <dbl>
## 1 1960-01-01 1960-01-01 71 2.6 0.1
## 2 1960-01-02 1960-01-02 61 11.4 1.5
## 3 1960-01-03 1960-01-03 63 14.7 2
## 4 1960-01-04 1960-01-04 53 7.3 -0.1
## 5 1960-01-05 1960-01-05 51 7.1 0.2
## 6 1960-01-06 1960-01-06 53 7.3 4.5
## 7 1960-01-07 1960-01-07 61 7.4 3.1
## 8 1960-01-08 1960-01-08 2 10.7 0.4
## 9 1960-01-09 1960-01-09 3 15 -0.8
## 10 1960-01-10 1960-01-10 3 17.5 4
## # ℹ 23,366 more rows
## # ℹ 13 more variables: apparent_temperature_max <dbl>,
## # apparent_temperature_min <dbl>, precipitation_sum <dbl>, rain_sum <dbl>,
## # snowfall_sum <dbl>, precipitation_hours <dbl>, sunrise <chr>, sunset <chr>,
## # windspeed_10m_max <dbl>, windgusts_10m_max <dbl>,
## # winddirection_10m_dominant <int>, shortwave_radiation_sum <dbl>,
## # et0_fao_evapotranspiration <dbl>
##
## $tblHourly
## NULL
##
## $tblUnits
## # A tibble: 17 × 4
## metricType name value description
## <chr> <chr> <chr> <chr>
## 1 daily_units time "iso8601" <NA>
## 2 daily_units weathercode "wmo code" The most severe weather co…
## 3 daily_units temperature_2m_max "deg C" Maximum and minimum daily …
## 4 daily_units temperature_2m_min "deg C" Maximum and minimum daily …
## 5 daily_units apparent_temperature_max "deg C" Maximum and minimum daily …
## 6 daily_units apparent_temperature_min "deg C" Maximum and minimum daily …
## 7 daily_units precipitation_sum "mm" Sum of daily precipitation…
## 8 daily_units rain_sum "mm" Sum of daily rain
## 9 daily_units snowfall_sum "cm" Sum of daily snowfall
## 10 daily_units precipitation_hours "h" The number of hours with r…
## 11 daily_units sunrise "iso8601" Sun rise and set times
## 12 daily_units sunset "iso8601" Sun rise and set times
## 13 daily_units windspeed_10m_max "km/h" Maximum wind speed and gus…
## 14 daily_units windgusts_10m_max "km/h" Maximum wind speed and gus…
## 15 daily_units winddirection_10m_dominant "deg " Dominant wind direction
## 16 daily_units shortwave_radiation_sum "MJ/m²" The sum of solar radiaion …
## 17 daily_units et0_fao_evapotranspiration "mm" Daily sum of ET0 Reference…
##
## $tblDescription
## # A tibble: 1 × 7
## latitude longitude generationtime_ms utc_offset_seconds timezone
## <dbl> <dbl> <dbl> <int> <chr>
## 1 33.8 -84.4 388. -18000 US/Eastern
## # ℹ 2 more variables: timezone_abbreviation <chr>, elevation <dbl>
##
##
## latitude: 33.77856
## longitude: -84.40299
## generationtime_ms: 388.082
## utc_offset_seconds: -18000
## timezone: US/Eastern
## timezone_abbreviation: EST
## elevation: 302
# Sample records of tibble
atlOMDaily$tblDaily %>% glimpse()
## Rows: 23,376
## Columns: 18
## $ date <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode <int> 71, 61, 63, 53, 51, 53, 61, 2, 3, 3, 3, 3, …
## $ temperature_2m_max <dbl> 2.6, 11.4, 14.7, 7.3, 7.1, 7.3, 7.4, 10.7, …
## $ temperature_2m_min <dbl> 0.1, 1.5, 2.0, -0.1, 0.2, 4.5, 3.1, 0.4, -0…
## $ apparent_temperature_max <dbl> -2.3, 9.2, 13.1, 2.8, 4.4, 6.1, 4.9, 6.9, 1…
## $ apparent_temperature_min <dbl> -4.0, -3.1, -2.7, -4.7, -3.3, 1.4, -0.5, -4…
## $ precipitation_sum <dbl> 3.0, 7.8, 7.3, 1.1, 0.6, 5.8, 8.7, 0.0, 0.0…
## $ rain_sum <dbl> 2.5, 7.8, 7.3, 1.1, 0.6, 5.8, 8.7, 0.0, 0.0…
## $ snowfall_sum <dbl> 0.35, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
## $ precipitation_hours <dbl> 13, 21, 5, 3, 3, 17, 8, 0, 0, 0, 0, 0, 0, 5…
## $ sunrise <chr> "1960-01-01T07:42", "1960-01-02T07:42", "19…
## $ sunset <chr> "1960-01-01T17:39", "1960-01-02T17:40", "19…
## $ windspeed_10m_max <dbl> 20.9, 18.7, 23.8, 15.5, 9.2, 11.4, 19.2, 17…
## $ windgusts_10m_max <dbl> 39.6, 41.4, 56.2, 44.6, 33.1, 41.8, 39.2, 3…
## $ winddirection_10m_dominant <int> 90, 106, 295, 318, 2, 69, 288, 313, 140, 24…
## $ shortwave_radiation_sum <dbl> 2.38, 1.83, 10.18, 9.37, 3.88, 1.77, 5.34, …
## $ et0_fao_evapotranspiration <dbl> 0.57, 0.38, 1.34, 1.36, 0.65, 0.33, 0.76, 1…
Variables are converted to proper data type:
dfDailyATL <- atlOMDaily$tblDaily %>%
mutate(weathercode=factor(weathercode),
sunrise_chr=sunrise,
sunset_chr=sunset,
sunrise=lubridate::ymd_hm(sunrise),
sunset=lubridate::ymd_hm(sunset),
fct_winddir=factor(winddirection_10m_dominant)
)
glimpse(dfDailyATL)
## Rows: 23,376
## Columns: 21
## $ date <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode <fct> 71, 61, 63, 53, 51, 53, 61, 2, 3, 3, 3, 3, …
## $ temperature_2m_max <dbl> 2.6, 11.4, 14.7, 7.3, 7.1, 7.3, 7.4, 10.7, …
## $ temperature_2m_min <dbl> 0.1, 1.5, 2.0, -0.1, 0.2, 4.5, 3.1, 0.4, -0…
## $ apparent_temperature_max <dbl> -2.3, 9.2, 13.1, 2.8, 4.4, 6.1, 4.9, 6.9, 1…
## $ apparent_temperature_min <dbl> -4.0, -3.1, -2.7, -4.7, -3.3, 1.4, -0.5, -4…
## $ precipitation_sum <dbl> 3.0, 7.8, 7.3, 1.1, 0.6, 5.8, 8.7, 0.0, 0.0…
## $ rain_sum <dbl> 2.5, 7.8, 7.3, 1.1, 0.6, 5.8, 8.7, 0.0, 0.0…
## $ snowfall_sum <dbl> 0.35, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
## $ precipitation_hours <dbl> 13, 21, 5, 3, 3, 17, 8, 0, 0, 0, 0, 0, 0, 5…
## $ sunrise <dttm> 1960-01-01 07:42:00, 1960-01-02 07:42:00, …
## $ sunset <dttm> 1960-01-01 17:39:00, 1960-01-02 17:40:00, …
## $ windspeed_10m_max <dbl> 20.9, 18.7, 23.8, 15.5, 9.2, 11.4, 19.2, 17…
## $ windgusts_10m_max <dbl> 39.6, 41.4, 56.2, 44.6, 33.1, 41.8, 39.2, 3…
## $ winddirection_10m_dominant <int> 90, 106, 295, 318, 2, 69, 288, 313, 140, 24…
## $ shortwave_radiation_sum <dbl> 2.38, 1.83, 10.18, 9.37, 3.88, 1.77, 5.34, …
## $ et0_fao_evapotranspiration <dbl> 0.57, 0.38, 1.34, 1.36, 0.65, 0.33, 0.76, 1…
## $ sunrise_chr <chr> "1960-01-01T07:42", "1960-01-02T07:42", "19…
## $ sunset_chr <chr> "1960-01-01T17:39", "1960-01-02T17:40", "19…
## $ fct_winddir <fct> 90, 106, 295, 318, 2, 69, 288, 313, 140, 24…
Averages by month for select continuous variables are plotted:
tmpMapNames <- c("precipitation_sum"="3. Precipitation (mm)",
"windspeed_10m_max"="4. Windspeed (kph)",
"temperature_2m_max"="1. High Temperature (C)",
"temperature_2m_min"="2. Low Temperature (C)"
)
dfDailyATL %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(year, month, temperature_2m_max, temperature_2m_min, precipitation_sum, windspeed_10m_max) %>%
group_by(year, month) %>%
summarize(across(-c(precipitation_sum), .fns=mean),
across(c(precipitation_sum), .fns=sum),
.groups="drop"
) %>%
group_by(month) %>%
summarize(across(-c(year), .fns=mean)) %>%
pivot_longer(-c(month)) %>%
ggplot(aes(x=month, y=value)) +
geom_line(aes(group=1)) +
facet_wrap(~tmpMapNames[name], scales="free_y") +
labs(x=NULL, y=NULL, title="Monthly averages for key metrics (Atlanta 1960-2023)")
Averages by month for select categorical variables are plotted:
tmpMapNames <- c("wc"="1. Weather Type",
"winddir"="2. Predominant Wind Direction"
)
tmpDFPlot <- dfDailyATL %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb),
year=year(date),
winddir=case_when(winddirection_10m_dominant>360~"Invalid",
winddirection_10m_dominant>=315~"1. N",
winddirection_10m_dominant>=225~"2. W",
winddirection_10m_dominant>=135~"3. S",
winddirection_10m_dominant>=45~"4. E",
winddirection_10m_dominant>=0~"1. N",
TRUE~"Invalid"
),
wc=case_when(weathercode==0~"1. Clear",
weathercode %in% c(1, 2, 3)~"2. Dry",
weathercode %in% c(51, 53, 55)~"3. Drizzle",
weathercode %in% c(61, 63, 65)~"4. Rain",
weathercode %in% c(71, 73, 75)~"5. Snow",
TRUE~"Error"
)
) %>%
select(month, wc, winddir) %>%
pivot_longer(-c(month)) %>%
count(month, name, value)
tmpPlotFN <- function(x) {
p1 <- tmpDFPlot %>%
filter(name==x) %>%
ggplot(aes(x=month, y=n)) +
geom_line(aes(group=value, color=value), lwd=2) +
labs(x=NULL,
y=NULL,
title=paste0(tmpMapNames[x], " (Atlanta 1960-2023)")
) +
scale_color_discrete(NULL) +
theme(legend.position = "bottom")
return(p1)
}
gridExtra::grid.arrange(tmpPlotFN("wc"), tmpPlotFN("winddir"), nrow=1)
Averages by year for select variables are plotted:
tmpMapNames <- c("precipitation_sum"="3. Precipitation (mm)",
"windspeed_10m_max"="4. Windspeed (kph)",
"temperature_2m_max"="1. High Temperature (C)",
"temperature_2m_min"="2. Low Temperature (C)"
)
dfDailyATL %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(year, temperature_2m_max, temperature_2m_min, precipitation_sum, windspeed_10m_max) %>%
group_by(year) %>%
summarize(across(-c(precipitation_sum), .fns=mean),
across(c(precipitation_sum), .fns=sum),
.groups="drop"
) %>%
pivot_longer(-c(year)) %>%
ggplot(aes(x=year, y=value)) +
geom_line(aes(group=1)) +
facet_wrap(~tmpMapNames[name], scales="free_y") +
labs(x=NULL, y=NULL, title="Annual averages for key metrics (Atlanta 1960-2023)")
Averages by year for select categorical variables are plotted:
tmpMapNames <- c("wc"="1. Weather Type",
"winddir"="2. Predominant Wind Direction"
)
tmpDFPlot <- dfDailyATL %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb),
year=year(date),
winddir=case_when(winddirection_10m_dominant>360~"Invalid",
winddirection_10m_dominant>=315~"1. N",
winddirection_10m_dominant>=225~"2. W",
winddirection_10m_dominant>=135~"3. S",
winddirection_10m_dominant>=45~"4. E",
winddirection_10m_dominant>=0~"1. N",
TRUE~"Invalid"
),
wc=case_when(weathercode==0~"1. Clear",
weathercode %in% c(1, 2, 3)~"2. Dry",
weathercode %in% c(51, 53, 55)~"3. Drizzle",
weathercode %in% c(61, 63, 65)~"4. Rain",
weathercode %in% c(71, 73, 75)~"5. Snow",
TRUE~"Error"
)
) %>%
select(year, wc, winddir) %>%
pivot_longer(-c(year)) %>%
count(year, name, value)
tmpPlotFN <- function(x) {
p1 <- tmpDFPlot %>%
filter(name==x) %>%
ggplot(aes(x=year, y=n)) +
geom_line(aes(group=value, color=value), lwd=1) +
labs(x=NULL,
y=NULL,
title=paste0(tmpMapNames[x], " (Atlanta 1960-2023)")
) +
scale_color_discrete(NULL) +
theme(legend.position = "bottom")
return(p1)
}
gridExtra::grid.arrange(tmpPlotFN("wc"), tmpPlotFN("winddir"), nrow=1)
Boxplots for maximum windspeed are created:
dfDailyATL %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(month, windspeed_10m_max) %>%
ggplot(aes(x=month, y=windspeed_10m_max)) +
geom_boxplot(fill="lightblue") +
lims(y=c(0, NA)) +
labs(x=NULL,
y="Maximum daily wind speed (kph)",
title="Maximum daily windspeed by month (Atlanta 1960-2023)"
)
Boxplots for precipitation are created:
dfDailyATL %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(month, precipitation_hours) %>%
ggplot(aes(x=month, y=precipitation_hours)) +
geom_boxplot(fill="lightblue") +
lims(y=c(0, NA)) +
labs(x=NULL,
y="Daily precipitation hours",
title="Daily precipitation hours by month (Atlanta 1960-2023)"
)
dfDailyATL %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(month, precipitation_sum) %>%
ggplot(aes(x=month, y=precipitation_sum)) +
geom_boxplot(fill="lightblue") +
lims(y=c(0, NA)) +
labs(x=NULL,
y="Daily precipitation (mm)",
title="Daily precipitation by month (Atlanta 1960-2023)"
)
Boxplots for temperature are created:
dfDailyATL %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(month, temperature_2m_max) %>%
ggplot(aes(x=month, y=temperature_2m_max)) +
geom_boxplot(fill="lightblue") +
labs(x=NULL,
y="Daily high temperature (C)",
title="Daily high temperature by month (Atlanta 1960-2023)"
)
dfDailyATL %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(month, temperature_2m_min) %>%
ggplot(aes(x=month, y=temperature_2m_min)) +
geom_boxplot(fill="lightblue") +
labs(x=NULL,
y="Daily low temperature (C)",
title="Daily low temperature by month (Atlanta 1960-2023)"
)
ACF and PACF are explored for maximum temperature:
acfTemp <- acf(dfDailyATL$temperature_2m_max, lag.max=1000)
as.vector(acfTemp$acf) %>% findPeaks(width=21) %>% which()
## [1] 365 737
as.vector(acfTemp$acf) %>% findPeaks(width=21, FUN=min) %>% which()
## [1] 185 549 916
pacfTemp <- pacf(dfDailyATL$temperature_2m_max, lag.max=1000)
pacf(dfDailyATL$temperature_2m_max, lag.max=50)
As expected, ACF has a sustained seasonal pattern, with peaks (at roughly intervals of 365) and valleys (at roughly intervals of 365 offset by roughly 185) corresponding to the 365-day year
ACF and PACF are explored for precipitation:
acfPrecip <- acf(dfDailyATL$precipitation_sum, lag.max=1000)
acf(dfDailyATL$precipitation_sum, lag.max=50)
as.vector(acfPrecip$acf) %>% findPeaks(width=21) %>% which()
## [1] 28 52 78 100 126 140 157 187 202 213 250 284 311 335 361 377 406 428 442
## [20] 465 481 502 523 540 553 585 609 636 650 680 692 710 728 744 757 769 791 803
## [39] 825 852 870 897 913 930 941 961 989
as.vector(acfPrecip$acf) %>% findPeaks(width=21, FUN=min) %>% which()
## [1] 18 56 71 97 120 134 166 178 195 210 226 238 268 288 302 320 332 344 359
## [20] 383 409 420 438 452 471 483 497 533 551 565 581 598 613 633 654 676 695 719
## [39] 743 766 799 812 842 862 902 920 948 977
pacfPrecip <- pacf(dfDailyATL$precipitation_sum, lag.max=1000)
pacf(dfDailyATL$precipitation_sum, lag.max=50)
Precipitation by contrast has little seasonality and mostly just a correlation to the previous day’s value
ACF and PACF are explored for windspeed:
acfWind <- acf(dfDailyATL$windspeed_10m_max, lag.max=1000)
acf(dfDailyATL$windspeed_10m_max, lag.max=50)
as.vector(acfWind$acf) %>% findPeaks(width=21) %>% which()
## [1] 13 172 183 338 361 378 553 738 894 921
as.vector(acfWind$acf) %>% findPeaks(width=21, FUN=min) %>% which()
## [1] 174 185 555 730 891 910 944
pacfWind <- pacf(dfDailyATL$windspeed_10m_max, lag.max=1000)
pacf(dfDailyATL$windspeed_10m_max, lag.max=50)
Similar to temperature, wind speed appears to have a sustained seasonal component, though peak correlations are much lower in magnitude (~0.2 for wind speed vs. ~0.75 for temperature)
A boxplot for delta daily temperature is created:
dfDailyATL %>%
mutate(month=factor(month.abb[month(date)], levels=month.abb),
chg=temperature_2m_max-lag(temperature_2m_max)
) %>%
filter(!is.na(chg)) %>%
ggplot(aes(x=month, y=chg)) +
geom_boxplot() +
labs(x=NULL, y="Daily change in maximum temperature")
Daily temperature changes are generally larger in magnitude during the colder season
A boxplot for delta daily wind speed is created:
dfDailyATL %>%
mutate(month=factor(month.abb[month(date)], levels=month.abb),
chg=windspeed_10m_max-lag(windspeed_10m_max)
) %>%
filter(!is.na(chg)) %>%
ggplot(aes(x=month, y=chg)) +
geom_boxplot() +
labs(x=NULL, y="Daily change in maximum wind speed")
Daily wind speed changes are generally larger in magnitude during the colder season
A boxplot for delta daily precipitation is created:
dfDailyATL %>%
mutate(month=factor(month.abb[month(date)], levels=month.abb),
chg=precipitation_sum-lag(precipitation_sum)
) %>%
filter(!is.na(chg)) %>%
ggplot(aes(x=month, y=chg)) +
geom_boxplot() +
labs(x=NULL, y="Daily change in precipitation")
Most days have no precipitation. The corresponding boxplot for precipitation change has small boxes and whiskers with many outliers
Averages for temperature, precipitation, and wind speed are calculated by day of year, using a 21-day rolling window:
df_r21 <- dfDailyATL %>%
mutate(doy=pmin(yday(date), 365)) %>%
helperRollingAgg(origVar="temperature_2m_max", newName="temp_r21", k=21) %>%
helperRollingAgg(origVar="windspeed_10m_max", newName="wind_r21", k=21) %>%
helperRollingAgg(origVar="precipitation_sum", newName="precip_r21", k=21) %>%
select(date, doy, contains("_r21"))
df_r21 %>%
na.omit() %>%
group_by(doy) %>%
summarize(across(where(is.numeric), .fns=mean), n=n()) %>%
pivot_longer(cols=-c(doy)) %>%
ggplot(aes(x=doy, y=value)) +
geom_line(aes(group=name, color=name)) +
facet_wrap(~name, scales="free_y") +
labs(x="Day of Year", y="Rolling-21 mean") +
theme(legend.position="none")
Outliers are likely much more important in driving average precipitation than in driving average temperature
Standard deviations for temperature, precipitation, and wind speed are calculated by day of year, using a 21-day rolling window:
df_r21_sd <- dfDailyATL %>%
mutate(doy=pmin(yday(date), 365)) %>%
group_by(doy) %>%
mutate(across(.cols=c(temperature_2m_max, windspeed_10m_max, precipitation_sum), .fns=sd)) %>%
ungroup() %>%
helperRollingAgg(origVar="temperature_2m_max", newName="temp_r21", k=21) %>%
helperRollingAgg(origVar="windspeed_10m_max", newName="wind_r21", k=21) %>%
helperRollingAgg(origVar="precipitation_sum", newName="precip_r21", k=21) %>%
select(date, doy, contains("_r21"))
df_r21_sd %>%
na.omit() %>%
group_by(doy) %>%
summarize(across(where(is.numeric), .fns=mean)) %>%
pivot_longer(cols=-c(doy)) %>%
ggplot(aes(x=doy, y=value)) +
geom_line(aes(group=name, color=name)) +
facet_wrap(~name, scales="free_y") +
labs(x="Day of Year", y="Rolling-21 mean of daily standard deviation") +
theme(legend.position="none")
Means and standard deviations are plotted together:
df_r21 %>%
bind_rows(df_r21_sd, .id="src") %>%
na.omit() %>%
mutate(musig=c("1"="Mean", "2"="SD")[src]) %>%
group_by(doy, musig) %>%
summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
pivot_longer(cols=-c(doy, musig)) %>%
pivot_wider(id_cols=c(doy, name), names_from="musig") %>%
ggplot(aes(x=doy)) +
geom_line(aes(y=Mean, group=name, color=name), lwd=2) +
geom_ribbon(aes(ymin=Mean-SD, ymax=Mean+SD, fill=name), alpha=0.5) +
facet_wrap(~name, scales="free_y") +
labs(x="Day of Year",
y="Rolling-21 mean +/- daily standard deviation",
title="Rolling 21-day mean +/- one rolling 21-day sd"
) +
theme(legend.position="none")
Means and approximate SEM are plotted together:
nYears <- length(unique(year(df_r21$date)))
nYears
## [1] 64
df_r21 %>%
bind_rows(df_r21_sd, .id="src") %>%
na.omit() %>%
mutate(musig=c("1"="Mean", "2"="SD")[src]) %>%
group_by(doy, musig) %>%
summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
pivot_longer(cols=-c(doy, musig)) %>%
pivot_wider(id_cols=c(doy, name), names_from="musig") %>%
ggplot(aes(x=doy)) +
geom_line(aes(y=Mean, group=name, color=name), lwd=2) +
geom_ribbon(aes(ymin=Mean-SD/sqrt(nYears-1), ymax=Mean+SD/sqrt(nYears-1), fill=name), alpha=0.5) +
facet_wrap(~name, scales="free_y") +
labs(x="Day of Year",
y="Rolling-21 mean +/- 1 SEM (approx)",
title="Rolling 21-day mean +/- 1 SEM (approx)"
) +
theme(legend.position="none")
Consistent with previous analyses, temperature has a strong seasonal pattern with differences in mean that vastly exceed the standard error of the mean. By contrast, while there is apparent seasonal spikiness to precipitation, rolling 21-day means are usually within one standard error of the mean of each other. Wind is more similar to temperature in having seasonal variations that meaningfully exceed SEM in magnitude.
Percentage of cumulative precipitation by volume of daily precipitation is explored:
tmpPlotData <- dfDailyATL %>%
select(date, precipitation_sum) %>%
group_by(precipitation_sum) %>%
summarize(n=n(), precip=sum(precipitation_sum), .groups="drop") %>%
mutate(csp=cumsum(precip), cpp=csp/sum(precip), csn=cumsum(n), cpn=csn/sum(n))
tmpPlotData %>%
pivot_longer(cols=-c(precipitation_sum)) %>%
ggplot(aes(x=precipitation_sum, y=value)) +
geom_line(data=~filter(., name %in% c("cpp", "cpn")),
aes(group=name, color=c("cpn"="# events", "cpp"="precip")[name])
) +
labs(x="Daily precipitation (mm)", y="% total (cumul)") +
scale_color_discrete("Metric")
Around half of days have no precipitation. Around 25% of total precipitation comes from the rare day with 25+ mm (about an inch) of precipitation
Total precipitation by decade is also explored, sorted by daily precipitation:
dfDailyATL %>%
select(date, precipitation_sum) %>%
mutate(year=year(date), decade=round(year(date)-4.5, -1)) %>%
group_by(decade, precipitation_sum) %>%
summarize(n=n(), precip=sum(precipitation_sum), .groups="drop") %>%
group_by(decade) %>%
mutate(csp=cumsum(precip), ntot=sum(n), meancsp=365*csp/ntot) %>%
ungroup() %>%
ggplot(aes(x=precipitation_sum, y=meancsp)) +
geom_line(aes(group=factor(decade), color=factor(decade)), lwd=1) +
labs(x="Daily precipitation (mm)",
y="Cumulative annual precipitation (mm)",
title="Average annual precipitation by decade",
subtitle="Cumulative, for daily precipitation amounts LTE x-axis"
) +
scale_color_discrete("Decade")
Trends look similar by decade for precipitation amounts under 0.5 inch (12.5 mm). Heavier precipitation amounts appear to be more prevalent in the 202s, driving overall precipitation totals ~50% higher
Precipitation appears to have broken trend starting in the late 2010s:
dfDailyATL %>%
select(date, precipitation_sum) %>%
mutate(year=year(date), decade=round(year(date)-4.5, -1)) %>%
group_by(decade, year) %>%
summarize(precip=sum(precipitation_sum), .groups="drop") %>%
ggplot(aes(x=year, y=precip)) +
geom_line(aes(color=ifelse(year>=2017, "2017-2023", "pre-2017")), lwd=1) +
geom_smooth(aes(color=ifelse(year>=2017, "2017-2023", "pre-2017")), method="lm", lty=2) +
geom_point() +
labs(x=NULL, y="Annual precipitation (mm)") +
scale_color_discrete(NULL) +
lims(y=c(0, NA))
## `geom_smooth()` using formula = 'y ~ x'
Precipitation by month pre-2017 is explored:
yMax <- dfDailyATL %>%
group_by(year=year(date), month=month(date)) %>%
summarize(precip=sum(precipitation_sum), .groups="drop") %>%
mutate(precip=ceiling(precip/50)*50) %>%
pull() %>%
max()
yMax
## [1] 350
p1 <- dfDailyATL %>%
select(date, precipitation_sum) %>%
mutate(year=year(date),
decade=round(year(date)-4.5, -1),
month=factor(month.abb[month(date)], levels=month.abb),
tp=ifelse(year<2017, "pre", year)
) %>%
group_by(tp, year, month) %>%
summarize(precip=sum(precipitation_sum), .groups="drop") %>%
group_by(tp, month) %>%
summarize(mu=mean(precip), sd=sd(precip), .groups="drop") %>%
ggplot(aes(x=month)) +
geom_col(data=~filter(., tp=="pre"), aes(y=mu), fill="lightblue") +
geom_errorbar(data=~filter(., tp=="pre"), aes(ymin=mu-sd, ymax=mu+sd), width=0.2) +
geom_hline(data=~filter(., tp=="pre"), color="red", lty=2, aes(yintercept=mean(mu))) +
labs(x=NULL, y="Monthly precipitation (mm)", title="Monthly precipitation +/- 1 SD\n(pre-2017)") +
lims(y=c(0, yMax))
p1
Precipitation by month post-2017 is explored:
p2 <- dfDailyATL %>%
select(date, precipitation_sum) %>%
mutate(year=year(date),
decade=round(year(date)-4.5, -1),
month=factor(month.abb[month(date)], levels=month.abb),
tp=ifelse(year<2017, "pre", year)
) %>%
group_by(tp, year, month) %>%
summarize(precip=sum(precipitation_sum), .groups="drop") %>%
filter(tp!="pre") %>%
group_by(month) %>%
summarize(mu=mean(precip), mx=max(precip), mn=min(precip), .groups="drop") %>%
ggplot(aes(x=month)) +
geom_col(aes(y=mu), fill="lightblue") +
geom_errorbar(aes(ymin=mn, ymax=mx), width=0.2) +
geom_hline(color="red", lty=2, aes(yintercept=mean(mu))) +
labs(x=NULL, y="Monthly precipitation (mm)", title="Monthly precipitation\nmax-mean-min (post-2017)") +
lims(y=c(0, yMax))
p2
The data are plotted on a single plot:
p3 <- dfDailyATL %>%
select(date, precipitation_sum) %>%
mutate(year=year(date),
decade=round(year(date)-4.5, -1),
month=factor(month.abb[month(date)], levels=month.abb),
tp=ifelse(year<2017, "pre", "post")
) %>%
group_by(tp, year, month) %>%
summarize(precip=sum(precipitation_sum), .groups="drop") %>%
group_by(tp, month) %>%
summarize(mu=mean(precip), sd=sd(precip), mx=max(precip), mn=min(precip), .groups="drop") %>%
ggplot(aes(x=month)) +
geom_tile(data=~filter(., tp=="pre"), aes(y=mu, height=2*sd), width=0.75, fill="lightblue") +
geom_hline(data=~filter(., tp=="pre"), color="red", lty=2, aes(yintercept=mean(mu))) +
geom_errorbar(data=~filter(., tp=="post"), aes(ymin=mn, ymax=mx), width=0) +
geom_point(data=~filter(., tp=="post"), aes(y=mu)) +
annotate("text", x=1, y=25, label="bars are pre-2017\nmean +/- 1 SD", size=2.5, hjust=0) +
annotate("text", x=1, y=300, label="lines/points are post-2017\nmax-mean-min", size=2.5, hjust=0) +
labs(x=NULL, y="Monthly precipitation (mm)", title="Monthly precipitation\n(pre/post-2017)") +
lims(y=c(0, yMax))
p3
The average number of days with precipitation by month is explored:
yMax <- dfDailyATL %>%
group_by(year=year(date), month=month(date)) %>%
summarize(precip=sum(precipitation_sum>0), .groups="drop") %>%
mutate(precip=ceiling(precip/5)*5) %>%
pull() %>%
max()
yMax
## [1] 35
p4 <- dfDailyATL %>%
select(date, precipitation_sum) %>%
mutate(year=year(date),
decade=round(year(date)-4.5, -1),
month=factor(month.abb[month(date)], levels=month.abb),
tp=ifelse(year<2017, "pre", "post")
) %>%
group_by(tp, year, month) %>%
summarize(precip=sum(precipitation_sum>0), .groups="drop") %>%
group_by(tp, month) %>%
summarize(mu=mean(precip), sd=sd(precip), mx=max(precip), mn=min(precip), .groups="drop") %>%
ggplot(aes(x=month)) +
geom_tile(data=~filter(., tp=="pre"), aes(y=mu, height=2*sd), width=0.75, fill="lightblue") +
geom_hline(data=~filter(., tp=="pre"), color="red", lty=2, aes(yintercept=mean(mu))) +
geom_errorbar(data=~filter(., tp=="post"), aes(ymin=mn, ymax=mx), width=0) +
geom_point(data=~filter(., tp=="post"), aes(y=mu)) +
annotate("text", x=1, y=5, label="bars are pre-2017\nmean +/- 1 SD", size=2.5, hjust=0) +
annotate("text", x=1, y=30, label="lines/points are post-2017\nmax-mean-min", size=2.5, hjust=0) +
labs(x=NULL,
y="Monthly precipitation (days greater than zero)",
title="Monthly days with precipitation\n(pre/post-2017)") +
lims(y=c(0, yMax))
p4
Median precipitation on days with precipitation > 0, by month, is explored:
p5 <- dfDailyATL %>%
select(date, precipitation_sum) %>%
mutate(year=year(date),
decade=round(year(date)-4.5, -1),
month=factor(month.abb[month(date)], levels=month.abb),
tp=ifelse(year<2017, "pre", "post")
) %>%
group_by(tp, month) %>%
summarize(nprecip=sum(precipitation_sum>0),
tprecip=sum(precipitation_sum),
mdnprecip=median(ifelse(precipitation_sum>0, precipitation_sum, NA), na.rm=TRUE),
muprecip=mean(ifelse(precipitation_sum>0, precipitation_sum, NA), na.rm=TRUE),
.groups="drop"
) %>%
select(tp, month, mdnprecip, muprecip) %>%
pivot_longer(-c(tp, month)) %>%
ggplot(aes(x=month)) +
geom_line(aes(y=value, group=tp, color=c("pre"="pre-2017", "post"="post-2017")[tp])) +
facet_wrap(~c("mdnprecip"="Median (mm) on days with precipitation",
"muprecip"="Mean (mm) on days with precipitation"
)[name]
) +
labs(x=NULL,
title="Monthly precipitation on days with precipitation greater than zero\n(pre/post-2017)",
y="Precipitation (mm)"
) +
ylim(c(0, NA)) +
scale_color_discrete(NULL)
p5
The heaviest 3-day precipitation by year is explored:
p6 <- dfDailyATL %>%
select(date, precipitation_sum) %>%
mutate(year=year(date),
decade=round(year(date)-4.5, -1),
month=factor(month.abb[month(date)], levels=month.abb),
tp=ifelse(year<2017, "pre", "post")
) %>%
group_by(tp, year) %>%
helperRollingAgg(origVar="precipitation_sum", newName="r3precip", func=zoo::rollsum, k=3) %>%
summarize(r3max=max(r3precip, na.rm=TRUE), .groups="drop") %>%
ggplot(aes(x=year)) +
geom_line(aes(y=r3max, group=tp, color=c("pre"="pre-2017", "post"="post-2017")[tp])) +
labs(x=NULL,
title="Maximum 3-day precipitation by year",
y="rolling 3-day sum of precipitation (mm)"
) +
geom_smooth(method="lm",
se=TRUE,
aes(y=r3max, group=tp, color=c("pre"="pre-2017", "post"="post-2017")[tp])
) +
ylim(c(0, NA)) +
scale_color_discrete(NULL)
p6
## `geom_smooth()` using formula = 'y ~ x'
Long term daily data is downloaded for Detroit:
# Daily data download for Detroit, MI
testURLDaily <- helperOpenMeteoURL(cityName="Detroit MI",
dailyIndices=1:nrow(tblMetricsDaily),
startDate="1960-01-01",
endDate="2023-12-31",
tz="US/Eastern"
)
##
## Daily metrics created from indices: weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration
testURLDaily
## [1] "https://archive-api.open-meteo.com/v1/archive?latitude=42.38&longitude=-83.1&start_date=1960-01-01&end_date=2023-12-31&daily=weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration&timezone=US%2FEastern"
# Download file
if(!file.exists("testOM_daily_dtw.json")) {
fileDownload(fileName="testOM_daily_dtw.json", url=testURLDaily)
} else {
cat("\nFile testOM_daily_dtw.json already exists, skipping download\n")
}
##
## File testOM_daily_dtw.json already exists, skipping download
The daily dataset is loaded:
# Read daily JSON file
dtwOMDaily <- formatOpenMeteoJSON("testOM_daily_dtw.json")
##
## Objects in JSON include: latitude, longitude, generationtime_ms, utc_offset_seconds, timezone, timezone_abbreviation, elevation, daily_units, daily
##
## $tblDaily
## # A tibble: 23,376 × 18
## date time weathercode temperature_2m_max temperature_2m_min
## <date> <chr> <int> <dbl> <dbl>
## 1 1960-01-01 1960-01-01 3 0.4 -4.4
## 2 1960-01-02 1960-01-02 55 3.2 -2.5
## 3 1960-01-03 1960-01-03 51 2.7 -2.9
## 4 1960-01-04 1960-01-04 3 -3.3 -8.4
## 5 1960-01-05 1960-01-05 3 -6 -9.7
## 6 1960-01-06 1960-01-06 3 -0.8 -10.7
## 7 1960-01-07 1960-01-07 3 4.1 -4.9
## 8 1960-01-08 1960-01-08 51 2.5 -6.7
## 9 1960-01-09 1960-01-09 51 1.8 -7.3
## 10 1960-01-10 1960-01-10 51 4.7 -2
## # ℹ 23,366 more rows
## # ℹ 13 more variables: apparent_temperature_max <dbl>,
## # apparent_temperature_min <dbl>, precipitation_sum <dbl>, rain_sum <dbl>,
## # snowfall_sum <dbl>, precipitation_hours <dbl>, sunrise <chr>, sunset <chr>,
## # windspeed_10m_max <dbl>, windgusts_10m_max <dbl>,
## # winddirection_10m_dominant <int>, shortwave_radiation_sum <dbl>,
## # et0_fao_evapotranspiration <dbl>
##
## $tblHourly
## NULL
##
## $tblUnits
## # A tibble: 17 × 4
## metricType name value description
## <chr> <chr> <chr> <chr>
## 1 daily_units time "iso8601" <NA>
## 2 daily_units weathercode "wmo code" The most severe weather co…
## 3 daily_units temperature_2m_max "deg C" Maximum and minimum daily …
## 4 daily_units temperature_2m_min "deg C" Maximum and minimum daily …
## 5 daily_units apparent_temperature_max "deg C" Maximum and minimum daily …
## 6 daily_units apparent_temperature_min "deg C" Maximum and minimum daily …
## 7 daily_units precipitation_sum "mm" Sum of daily precipitation…
## 8 daily_units rain_sum "mm" Sum of daily rain
## 9 daily_units snowfall_sum "cm" Sum of daily snowfall
## 10 daily_units precipitation_hours "h" The number of hours with r…
## 11 daily_units sunrise "iso8601" Sun rise and set times
## 12 daily_units sunset "iso8601" Sun rise and set times
## 13 daily_units windspeed_10m_max "km/h" Maximum wind speed and gus…
## 14 daily_units windgusts_10m_max "km/h" Maximum wind speed and gus…
## 15 daily_units winddirection_10m_dominant "deg " Dominant wind direction
## 16 daily_units shortwave_radiation_sum "MJ/m²" The sum of solar radiaion …
## 17 daily_units et0_fao_evapotranspiration "mm" Daily sum of ET0 Reference…
##
## $tblDescription
## # A tibble: 1 × 7
## latitude longitude generationtime_ms utc_offset_seconds timezone
## <dbl> <dbl> <dbl> <int> <chr>
## 1 42.4 -83.1 372. -18000 US/Eastern
## # ℹ 2 more variables: timezone_abbreviation <chr>, elevation <dbl>
##
##
## latitude: 42.35501
## longitude: -83.13785
## generationtime_ms: 371.5783
## utc_offset_seconds: -18000
## timezone: US/Eastern
## timezone_abbreviation: GMT-5
## elevation: 199
# Sample records of tibble
dtwOMDaily$tblDaily %>% glimpse()
## Rows: 23,376
## Columns: 18
## $ date <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode <int> 3, 55, 51, 3, 3, 3, 3, 51, 51, 51, 3, 63, 6…
## $ temperature_2m_max <dbl> 0.4, 3.2, 2.7, -3.3, -6.0, -0.8, 4.1, 2.5, …
## $ temperature_2m_min <dbl> -4.4, -2.5, -2.9, -8.4, -9.7, -10.7, -4.9, …
## $ apparent_temperature_max <dbl> -4.4, -1.1, -1.8, -9.2, -13.1, -7.0, -3.1, …
## $ apparent_temperature_min <dbl> -8.1, -7.4, -8.9, -14.9, -15.2, -18.0, -9.6…
## $ precipitation_sum <dbl> 0.0, 7.6, 0.4, 0.0, 0.0, 0.0, 0.0, 0.4, 0.3…
## $ rain_sum <dbl> 0.0, 7.6, 0.4, 0.0, 0.0, 0.0, 0.0, 0.4, 0.3…
## $ snowfall_sum <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
## $ precipitation_hours <dbl> 0, 17, 4, 0, 0, 0, 0, 4, 3, 1, 0, 17, 6, 4,…
## $ sunrise <chr> "1960-01-01T08:01", "1960-01-02T08:01", "19…
## $ sunset <chr> "1960-01-01T17:10", "1960-01-02T17:10", "19…
## $ windspeed_10m_max <dbl> 15.0, 21.5, 25.6, 23.0, 28.7, 30.3, 30.9, 3…
## $ windgusts_10m_max <dbl> 30.2, 42.5, 48.2, 45.0, 54.4, 54.4, 57.6, 6…
## $ winddirection_10m_dominant <int> 109, 196, 261, 247, 257, 239, 223, 272, 161…
## $ shortwave_radiation_sum <dbl> 6.09, 0.93, 4.91, 4.91, 8.07, 6.47, 4.41, 6…
## $ et0_fao_evapotranspiration <dbl> 0.55, 0.33, 0.81, 0.96, 1.04, 1.07, 1.01, 1…
Variables are converted to proper data type:
dfDailyDTW <- dtwOMDaily$tblDaily %>%
mutate(weathercode=factor(weathercode),
sunrise_chr=sunrise,
sunset_chr=sunset,
sunrise=lubridate::ymd_hm(sunrise),
sunset=lubridate::ymd_hm(sunset),
fct_winddir=factor(winddirection_10m_dominant)
)
glimpse(dfDailyDTW)
## Rows: 23,376
## Columns: 21
## $ date <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode <fct> 3, 55, 51, 3, 3, 3, 3, 51, 51, 51, 3, 63, 6…
## $ temperature_2m_max <dbl> 0.4, 3.2, 2.7, -3.3, -6.0, -0.8, 4.1, 2.5, …
## $ temperature_2m_min <dbl> -4.4, -2.5, -2.9, -8.4, -9.7, -10.7, -4.9, …
## $ apparent_temperature_max <dbl> -4.4, -1.1, -1.8, -9.2, -13.1, -7.0, -3.1, …
## $ apparent_temperature_min <dbl> -8.1, -7.4, -8.9, -14.9, -15.2, -18.0, -9.6…
## $ precipitation_sum <dbl> 0.0, 7.6, 0.4, 0.0, 0.0, 0.0, 0.0, 0.4, 0.3…
## $ rain_sum <dbl> 0.0, 7.6, 0.4, 0.0, 0.0, 0.0, 0.0, 0.4, 0.3…
## $ snowfall_sum <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
## $ precipitation_hours <dbl> 0, 17, 4, 0, 0, 0, 0, 4, 3, 1, 0, 17, 6, 4,…
## $ sunrise <dttm> 1960-01-01 08:01:00, 1960-01-02 08:01:00, …
## $ sunset <dttm> 1960-01-01 17:10:00, 1960-01-02 17:10:00, …
## $ windspeed_10m_max <dbl> 15.0, 21.5, 25.6, 23.0, 28.7, 30.3, 30.9, 3…
## $ windgusts_10m_max <dbl> 30.2, 42.5, 48.2, 45.0, 54.4, 54.4, 57.6, 6…
## $ winddirection_10m_dominant <int> 109, 196, 261, 247, 257, 239, 223, 272, 161…
## $ shortwave_radiation_sum <dbl> 6.09, 0.93, 4.91, 4.91, 8.07, 6.47, 4.41, 6…
## $ et0_fao_evapotranspiration <dbl> 0.55, 0.33, 0.81, 0.96, 1.04, 1.07, 1.01, 1…
## $ sunrise_chr <chr> "1960-01-01T08:01", "1960-01-02T08:01", "19…
## $ sunset_chr <chr> "1960-01-01T17:10", "1960-01-02T17:10", "19…
## $ fct_winddir <fct> 109, 196, 261, 247, 257, 239, 223, 272, 161…
Averages by month for select continuous variables are plotted:
tmpMapNames <- c("precipitation_sum"="3. Precipitation (mm)",
"windspeed_10m_max"="4. Windspeed (kph)",
"temperature_2m_max"="1. High Temperature (C)",
"temperature_2m_min"="2. Low Temperature (C)"
)
dfDailyDTW %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(year, month, temperature_2m_max, temperature_2m_min, precipitation_sum, windspeed_10m_max) %>%
group_by(year, month) %>%
summarize(across(-c(precipitation_sum), .fns=mean),
across(c(precipitation_sum), .fns=sum),
.groups="drop"
) %>%
group_by(month) %>%
summarize(across(-c(year), .fns=mean)) %>%
pivot_longer(-c(month)) %>%
ggplot(aes(x=month, y=value)) +
geom_line(aes(group=1)) +
facet_wrap(~tmpMapNames[name], scales="free_y") +
labs(x=NULL, y=NULL, title="Monthly averages for key metrics (Detroit 1960-2023)")
Averages by month for select categorical variables are plotted:
tmpMapNames <- c("wc"="1. Weather Type",
"winddir"="2. Predominant Wind Direction"
)
tmpDFPlot <- dfDailyDTW %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb),
year=year(date),
winddir=case_when(winddirection_10m_dominant>360~"Invalid",
winddirection_10m_dominant>=315~"1. N",
winddirection_10m_dominant>=225~"2. W",
winddirection_10m_dominant>=135~"3. S",
winddirection_10m_dominant>=45~"4. E",
winddirection_10m_dominant>=0~"1. N",
TRUE~"Invalid"
),
wc=case_when(weathercode==0~"1. Clear",
weathercode %in% c(1, 2, 3)~"2. Dry",
weathercode %in% c(51, 53, 55)~"3. Drizzle",
weathercode %in% c(61, 63, 65)~"4. Rain",
weathercode %in% c(71, 73, 75)~"5. Snow",
TRUE~"Error"
)
) %>%
select(month, wc, winddir) %>%
pivot_longer(-c(month)) %>%
count(month, name, value)
tmpPlotFN <- function(x) {
p1 <- tmpDFPlot %>%
filter(name==x) %>%
ggplot(aes(x=month, y=n)) +
geom_line(aes(group=value, color=value), lwd=2) +
labs(x=NULL,
y=NULL,
title=paste0(tmpMapNames[x], " (Detroit 1960-2023)")
) +
scale_color_discrete(NULL) +
theme(legend.position = "bottom")
return(p1)
}
gridExtra::grid.arrange(tmpPlotFN("wc"), tmpPlotFN("winddir"), nrow=1)
Averages by year for select continuous variables are plotted:
tmpMapNames <- c("precipitation_sum"="3. Precipitation (mm)",
"windspeed_10m_max"="4. Windspeed (kph)",
"temperature_2m_max"="1. High Temperature (C)",
"temperature_2m_min"="2. Low Temperature (C)"
)
dfDailyDTW %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(year, temperature_2m_max, temperature_2m_min, precipitation_sum, windspeed_10m_max) %>%
group_by(year) %>%
summarize(across(-c(precipitation_sum), .fns=mean),
across(c(precipitation_sum), .fns=sum),
.groups="drop"
) %>%
pivot_longer(-c(year)) %>%
ggplot(aes(x=year, y=value)) +
geom_line(aes(group=1)) +
facet_wrap(~tmpMapNames[name], scales="free_y") +
labs(x=NULL, y=NULL, title="Annual averages for key metrics (Detroit 1960-2023)")
Averages by year for select categorical variables are plotted:
tmpMapNames <- c("wc"="1. Weather Type",
"winddir"="2. Predominant Wind Direction"
)
tmpDFPlot <- dfDailyDTW %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb),
year=year(date),
winddir=case_when(winddirection_10m_dominant>360~"Invalid",
winddirection_10m_dominant>=315~"1. N",
winddirection_10m_dominant>=225~"2. W",
winddirection_10m_dominant>=135~"3. S",
winddirection_10m_dominant>=45~"4. E",
winddirection_10m_dominant>=0~"1. N",
TRUE~"Invalid"
),
wc=case_when(weathercode==0~"1. Clear",
weathercode %in% c(1, 2, 3)~"2. Dry",
weathercode %in% c(51, 53, 55)~"3. Drizzle",
weathercode %in% c(61, 63, 65)~"4. Rain",
weathercode %in% c(71, 73, 75)~"5. Snow",
TRUE~"Error"
)
) %>%
select(year, wc, winddir) %>%
pivot_longer(-c(year)) %>%
count(year, name, value)
tmpPlotFN <- function(x) {
p1 <- tmpDFPlot %>%
filter(name==x) %>%
ggplot(aes(x=year, y=n)) +
geom_line(aes(group=value, color=value), lwd=1) +
labs(x=NULL,
y=NULL,
title=paste0(tmpMapNames[x], " (Detroit 1960-2023)")
) +
scale_color_discrete(NULL) +
theme(legend.position = "bottom")
return(p1)
}
gridExtra::grid.arrange(tmpPlotFN("wc"), tmpPlotFN("winddir"), nrow=1)
The apparent decrease in days with northerly winds is explored further:
dfDailyWind <- dfDailyDTW %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb),
year=year(date),
winddir=case_when(winddirection_10m_dominant>360~"Invalid",
winddirection_10m_dominant>=300~"1. N",
winddirection_10m_dominant>240~"2. E/W",
winddirection_10m_dominant>=120~"3. S",
winddirection_10m_dominant>60~"2. E/W",
winddirection_10m_dominant>=0~"1. N",
TRUE~"Invalid"
)
) %>%
select(year, month, date, winddir, winddirection_10m_dominant, windspeed_10m_max)
dfDailyWind
## # A tibble: 23,376 × 6
## year month date winddir winddirection_10m_dominant windspeed_10m_max
## <dbl> <fct> <date> <chr> <int> <dbl>
## 1 1960 Jan 1960-01-01 2. E/W 109 15
## 2 1960 Jan 1960-01-02 3. S 196 21.5
## 3 1960 Jan 1960-01-03 2. E/W 261 25.6
## 4 1960 Jan 1960-01-04 2. E/W 247 23
## 5 1960 Jan 1960-01-05 2. E/W 257 28.7
## 6 1960 Jan 1960-01-06 3. S 239 30.3
## 7 1960 Jan 1960-01-07 3. S 223 30.9
## 8 1960 Jan 1960-01-08 2. E/W 272 32.8
## 9 1960 Jan 1960-01-09 3. S 161 18.4
## 10 1960 Jan 1960-01-10 2. E/W 284 10.2
## # ℹ 23,366 more rows
dfDailyWind %>%
count(year, winddir) %>%
group_by(year) %>%
mutate(pct=n/sum(n)) %>%
ungroup() %>%
ggplot(aes(x=year, y=pct)) +
geom_line(aes(group=winddir, color=winddir), lwd=1) +
labs(x=NULL,
y="Percent of Days",
title=paste0("Predominant Wind Direction", " (Detroit 1960-2023)"),
caption="N (300-060)\nE/W (061-119, 241-299)\nS (120-240)"
) +
scale_color_discrete(NULL) +
geom_hline(data=~summarize(group_by(., winddir), pct=mean(pct)),
aes(color=winddir, group=winddir, yintercept=pct),
lty=2
) +
lims(y=c(0, NA)) +
theme(legend.position = "bottom")
Boxplots for maximum windspeed are created:
dfDailyDTW %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(month, windspeed_10m_max) %>%
ggplot(aes(x=month, y=windspeed_10m_max)) +
geom_boxplot(fill="lightblue") +
lims(y=c(0, NA)) +
labs(x=NULL,
y="Maximum daily wind speed (kph)",
title="Maximum daily windspeed by month (Detroit 1960-2023)"
)
Boxplots for precipitation are created:
dfDailyDTW %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(month, precipitation_hours) %>%
ggplot(aes(x=month, y=precipitation_hours)) +
geom_boxplot(fill="lightblue") +
lims(y=c(0, NA)) +
labs(x=NULL,
y="Daily precipitation hours",
title="Daily precipitation hours by month (Detroit 1960-2023)"
)
dfDailyDTW %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(month, precipitation_sum) %>%
ggplot(aes(x=month, y=precipitation_sum)) +
geom_boxplot(fill="lightblue") +
lims(y=c(0, NA)) +
labs(x=NULL,
y="Daily precipitation (mm)",
title="Daily precipitation by month (Detroit 1960-2023)"
)
Boxplots for temperature are created:
dfDailyDTW %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(month, temperature_2m_max) %>%
ggplot(aes(x=month, y=temperature_2m_max)) +
geom_boxplot(fill="lightblue") +
labs(x=NULL,
y="Daily high temperature (C)",
title="Daily high temperature by month (Detroit 1960-2023)"
)
dfDailyDTW %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(month, temperature_2m_min) %>%
ggplot(aes(x=month, y=temperature_2m_min)) +
geom_boxplot(fill="lightblue") +
labs(x=NULL,
y="Daily low temperature (C)",
title="Daily low temperature by month (Detroit 1960-2023)"
)
ACF and PACF are explored for maximum temperature:
acfTemp <- acf(dfDailyDTW$temperature_2m_max, lag.max=1000)
as.vector(acfTemp$acf) %>% findPeaks(width=21) %>% which()
## [1] 367 728
as.vector(acfTemp$acf) %>% findPeaks(width=21, FUN=min) %>% which()
## [1] 184 551 915
pacfTemp <- pacf(dfDailyDTW$temperature_2m_max, lag.max=1000)
pacf(dfDailyDTW$temperature_2m_max, lag.max=50)
As expected, ACF has a sustained seasonal pattern, with peaks (at roughly intervals of 365) and valleys (at roughly intervals of 365 offset by roughly 185) corresponding to the 365-day year
ACF and PACF are explored for precipitation:
acfPrecip <- acf(dfDailyDTW$precipitation_sum, lag.max=1000)
acf(dfDailyDTW$precipitation_sum, lag.max=50)
as.vector(acfPrecip$acf) %>% findPeaks(width=21) %>% which()
## [1] 30 56 77 101 118 147 179 197 213 250 266 301 327 358 371 400 428 442 454
## [20] 469 489 508 520 541 553 573 588 605 618 631 644 657 678 690 727 746 769 790
## [39] 820 847 859 879 898 918 929 943 958 974
as.vector(acfPrecip$acf) %>% findPeaks(width=21, FUN=min) %>% which()
## [1] 18 32 60 86 114 137 162 189 209 247 261 280 296 325 340 354 386 410 426
## [20] 451 462 494 515 535 550 568 583 603 615 652 680 708 725 742 771 806 831 852
## [39] 875 907 922 934 956 977
pacfPrecip <- pacf(dfDailyDTW$precipitation_sum, lag.max=1000)
pacf(dfDailyDTW$precipitation_sum, lag.max=50)
Precipitation by contrast has little seasonality and mostly just a correlation to the previous day’s value
ACF and PACF are explored for wind speed:
acfWind <- acf(dfDailyDTW$windspeed_10m_max, lag.max=1000)
acf(dfDailyDTW$windspeed_10m_max, lag.max=50)
as.vector(acfWind$acf) %>% findPeaks(width=21) %>% which()
## [1] 28 41 73 142 167 182 202 228 239 281 316 330 354 374 408 476 490 523 537
## [20] 564 620 695 717 737 834 852 873 900 952 977
as.vector(acfWind$acf) %>% findPeaks(width=21, FUN=min) %>% which()
## [1] 31 87 136 154 169 191 210 253 320 356 376 404 451 483 502 518 556 575 588
## [20] 600 753 839 869 916 935 948
pacfWind <- pacf(dfDailyDTW$windspeed_10m_max, lag.max=1000)
pacf(dfDailyDTW$windspeed_10m_max, lag.max=50)
Similar to temperature, wind speed appears to have a sustained seasonal component, though peak correlations are much lower in magnitude (~0.1 for wind speed vs. ~0.8 for temperature)
A boxplot for delta daily temperature is created:
dfDailyDTW %>%
mutate(month=factor(month.abb[month(date)], levels=month.abb),
chg=temperature_2m_max-lag(temperature_2m_max)
) %>%
filter(!is.na(chg)) %>%
ggplot(aes(x=month, y=chg)) +
geom_boxplot() +
labs(x=NULL, y="Daily change in maximum temperature")
Daily temperature changes are generally larger in magnitude during the colder season
A boxplot for delta daily wind speed is created:
dfDailyDTW %>%
mutate(month=factor(month.abb[month(date)], levels=month.abb),
chg=windspeed_10m_max-lag(windspeed_10m_max)
) %>%
filter(!is.na(chg)) %>%
ggplot(aes(x=month, y=chg)) +
geom_boxplot() +
labs(x=NULL, y="Daily change in maximum wind speed")
Daily wind speed changes are generally larger in magnitude during the colder season
A boxplot for delta daily precipitation is created:
dfDailyDTW %>%
mutate(month=factor(month.abb[month(date)], levels=month.abb),
chg=precipitation_sum-lag(precipitation_sum)
) %>%
filter(!is.na(chg)) %>%
ggplot(aes(x=month, y=chg)) +
geom_boxplot() +
labs(x=NULL, y="Daily change in precipitation")
Most days have no precipitation. The corresponding boxplot for precipitation change has small boxes and whiskers with many outliers
Averages for temperature, precipitation, and wind speed are calculated by day of year, using a 21-day rolling window:
df_r21 <- dfDailyDTW %>%
mutate(doy=pmin(yday(date), 365)) %>%
helperRollingAgg(origVar="temperature_2m_max", newName="temp_r21", k=21) %>%
helperRollingAgg(origVar="windspeed_10m_max", newName="wind_r21", k=21) %>%
helperRollingAgg(origVar="precipitation_sum", newName="precip_r21", k=21) %>%
select(date, doy, contains("_r21"))
df_r21 %>%
na.omit() %>%
group_by(doy) %>%
summarize(across(where(is.numeric), .fns=mean), n=n()) %>%
pivot_longer(cols=-c(doy)) %>%
ggplot(aes(x=doy, y=value)) +
geom_line(aes(group=name, color=name)) +
facet_wrap(~name, scales="free_y") +
labs(x="Day of Year", y="Rolling-21 mean") +
theme(legend.position="none")
Outliers are likely much more important in driving average precipitation than in driving average temperature
Standard deviations for temperature, precipitation, and wind speed are calculated by day of year, using a 21-day rolling window:
df_r21_sd <- dfDailyDTW %>%
mutate(doy=pmin(yday(date), 365)) %>%
group_by(doy) %>%
mutate(across(.cols=c(temperature_2m_max, windspeed_10m_max, precipitation_sum), .fns=sd)) %>%
ungroup() %>%
helperRollingAgg(origVar="temperature_2m_max", newName="temp_r21", k=21) %>%
helperRollingAgg(origVar="windspeed_10m_max", newName="wind_r21", k=21) %>%
helperRollingAgg(origVar="precipitation_sum", newName="precip_r21", k=21) %>%
select(date, doy, contains("_r21"))
df_r21_sd %>%
na.omit() %>%
group_by(doy) %>%
summarize(across(where(is.numeric), .fns=mean)) %>%
pivot_longer(cols=-c(doy)) %>%
ggplot(aes(x=doy, y=value)) +
geom_line(aes(group=name, color=name)) +
facet_wrap(~name, scales="free_y") +
labs(x="Day of Year", y="Rolling-21 mean of daily standard deviation") +
theme(legend.position="none")
Means and standard deviations are plotted together:
df_r21 %>%
bind_rows(df_r21_sd, .id="src") %>%
na.omit() %>%
mutate(musig=c("1"="Mean", "2"="SD")[src]) %>%
group_by(doy, musig) %>%
summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
pivot_longer(cols=-c(doy, musig)) %>%
pivot_wider(id_cols=c(doy, name), names_from="musig") %>%
ggplot(aes(x=doy)) +
geom_line(aes(y=Mean, group=name, color=name), lwd=2) +
geom_ribbon(aes(ymin=Mean-SD, ymax=Mean+SD, fill=name), alpha=0.5) +
facet_wrap(~name, scales="free_y") +
labs(x="Day of Year",
y="Rolling-21 mean +/- daily standard deviation",
title="Rolling 21-day mean +/- one rolling 21-day sd"
) +
theme(legend.position="none")
Means and approximate SEM are plotted together:
nYears <- length(unique(year(df_r21$date)))
nYears
## [1] 64
df_r21 %>%
bind_rows(df_r21_sd, .id="src") %>%
na.omit() %>%
mutate(musig=c("1"="Mean", "2"="SD")[src]) %>%
group_by(doy, musig) %>%
summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
pivot_longer(cols=-c(doy, musig)) %>%
pivot_wider(id_cols=c(doy, name), names_from="musig") %>%
ggplot(aes(x=doy)) +
geom_line(aes(y=Mean, group=name, color=name), lwd=2) +
geom_ribbon(aes(ymin=Mean-SD/sqrt(nYears-1), ymax=Mean+SD/sqrt(nYears-1), fill=name), alpha=0.5) +
facet_wrap(~name, scales="free_y") +
labs(x="Day of Year",
y="Rolling-21 mean +/- 1 SEM (approx)",
title="Rolling 21-day mean +/- 1 SEM (approx)"
) +
theme(legend.position="none")
Consistent with previous analyses, temperature has a strong seasonal pattern with differences in mean that vastly exceed the standard error of the mean. By contrast, while there is apparent seasonal spikiness to precipitation, rolling 21-day means are usually within one standard error of the mean of each other. Wind is more similar to temperature in having seasonal variations that meaningfully exceed SEM in magnitude.
Percentage of cumulative precipitation by volume of daily precipitation is explored:
tmpPlotData <- dfDailyDTW %>%
select(date, precipitation_sum) %>%
group_by(precipitation_sum) %>%
summarize(n=n(), precip=sum(precipitation_sum), .groups="drop") %>%
mutate(csp=cumsum(precip), cpp=csp/sum(precip), csn=cumsum(n), cpn=csn/sum(n))
tmpPlotData %>%
pivot_longer(cols=-c(precipitation_sum)) %>%
ggplot(aes(x=precipitation_sum, y=value)) +
geom_line(data=~filter(., name %in% c("cpp", "cpn")),
aes(group=name, color=c("cpn"="# events", "cpp"="precip")[name])
) +
labs(x="Daily precipitation (mm)", y="% total (cumul)") +
scale_color_discrete("Metric")
Around half of days have no precipitation. Around 25% of total precipitation comes from the rare day with 15+ mm (above about half an inch) of precipitation
Total precipitation by decade is also explored, sorted by daily precipitation:
dfDailyDTW %>%
select(date, precipitation_sum) %>%
mutate(year=year(date), decade=round(year(date)-4.5, -1)) %>%
group_by(decade, precipitation_sum) %>%
summarize(n=n(), precip=sum(precipitation_sum), .groups="drop") %>%
group_by(decade) %>%
mutate(csp=cumsum(precip), ntot=sum(n), meancsp=365*csp/ntot) %>%
ungroup() %>%
ggplot(aes(x=precipitation_sum, y=meancsp)) +
geom_line(aes(group=factor(decade), color=factor(decade)), lwd=1) +
labs(x="Daily precipitation (mm)",
y="Cumulative annual precipitation (mm)",
title="Average annual precipitation by decade",
subtitle="Cumulative, for daily precipitation amounts LTE x-axis"
) +
scale_color_discrete("Decade")
Trends look similar by decade for precipitation amounts under 0.5 inch (12.5 mm). Heavier precipitation amounts appear to be more prevalent in the 2020s and lightest in the 1960s, driving differences in annual precipitation
Long term daily data is downloaded for Chicago:
# Daily data download for Chicago, IL
testURLDaily <- helperOpenMeteoURL(cityName="Chicago IL",
dailyIndices=1:nrow(tblMetricsDaily),
startDate="1960-01-01",
endDate="2023-12-31",
tz="US/Central"
)
##
## Daily metrics created from indices: weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration
testURLDaily
## [1] "https://archive-api.open-meteo.com/v1/archive?latitude=41.84&longitude=-87.68&start_date=1960-01-01&end_date=2023-12-31&daily=weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration&timezone=US%2FCentral"
# Download file
if(!file.exists("testOM_daily_ord.json")) {
fileDownload(fileName="testOM_daily_ord.json", url=testURLDaily)
} else {
cat("\nFile testOM_daily_ord.json already exists, skipping download\n")
}
##
## File testOM_daily_ord.json already exists, skipping download
The daily dataset is loaded:
# Read daily JSON file
ordOMDaily <- formatOpenMeteoJSON("testOM_daily_ord.json")
##
## Objects in JSON include: latitude, longitude, generationtime_ms, utc_offset_seconds, timezone, timezone_abbreviation, elevation, daily_units, daily
##
## $tblDaily
## # A tibble: 23,376 × 18
## date time weathercode temperature_2m_max temperature_2m_min
## <date> <chr> <int> <dbl> <dbl>
## 1 1960-01-01 1960-01-01 3 1.6 -3.5
## 2 1960-01-02 1960-01-02 51 4.9 -0.9
## 3 1960-01-03 1960-01-03 3 -0.6 -7.5
## 4 1960-01-04 1960-01-04 2 -5.9 -11.8
## 5 1960-01-05 1960-01-05 3 -6.1 -11.2
## 6 1960-01-06 1960-01-06 1 1 -10.3
## 7 1960-01-07 1960-01-07 3 4.9 -3.2
## 8 1960-01-08 1960-01-08 3 1.6 -2.8
## 9 1960-01-09 1960-01-09 51 7.1 -3.4
## 10 1960-01-10 1960-01-10 51 4.2 -1
## # ℹ 23,366 more rows
## # ℹ 13 more variables: apparent_temperature_max <dbl>,
## # apparent_temperature_min <dbl>, precipitation_sum <dbl>, rain_sum <dbl>,
## # snowfall_sum <dbl>, precipitation_hours <dbl>, sunrise <chr>, sunset <chr>,
## # windspeed_10m_max <dbl>, windgusts_10m_max <dbl>,
## # winddirection_10m_dominant <int>, shortwave_radiation_sum <dbl>,
## # et0_fao_evapotranspiration <dbl>
##
## $tblHourly
## NULL
##
## $tblUnits
## # A tibble: 17 × 4
## metricType name value description
## <chr> <chr> <chr> <chr>
## 1 daily_units time "iso8601" <NA>
## 2 daily_units weathercode "wmo code" The most severe weather co…
## 3 daily_units temperature_2m_max "deg C" Maximum and minimum daily …
## 4 daily_units temperature_2m_min "deg C" Maximum and minimum daily …
## 5 daily_units apparent_temperature_max "deg C" Maximum and minimum daily …
## 6 daily_units apparent_temperature_min "deg C" Maximum and minimum daily …
## 7 daily_units precipitation_sum "mm" Sum of daily precipitation…
## 8 daily_units rain_sum "mm" Sum of daily rain
## 9 daily_units snowfall_sum "cm" Sum of daily snowfall
## 10 daily_units precipitation_hours "h" The number of hours with r…
## 11 daily_units sunrise "iso8601" Sun rise and set times
## 12 daily_units sunset "iso8601" Sun rise and set times
## 13 daily_units windspeed_10m_max "km/h" Maximum wind speed and gus…
## 14 daily_units windgusts_10m_max "km/h" Maximum wind speed and gus…
## 15 daily_units winddirection_10m_dominant "deg " Dominant wind direction
## 16 daily_units shortwave_radiation_sum "MJ/m²" The sum of solar radiaion …
## 17 daily_units et0_fao_evapotranspiration "mm" Daily sum of ET0 Reference…
##
## $tblDescription
## # A tibble: 1 × 7
## latitude longitude generationtime_ms utc_offset_seconds timezone
## <dbl> <dbl> <dbl> <int> <chr>
## 1 41.9 -87.6 369. -21600 US/Central
## # ℹ 2 more variables: timezone_abbreviation <chr>, elevation <dbl>
##
##
## latitude: 41.86292
## longitude: -87.64877
## generationtime_ms: 368.9336
## utc_offset_seconds: -21600
## timezone: US/Central
## timezone_abbreviation: GMT-6
## elevation: 180
# Sample records of tibble
ordOMDaily$tblDaily %>% glimpse()
## Rows: 23,376
## Columns: 18
## $ date <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode <int> 3, 51, 3, 2, 3, 1, 3, 3, 51, 51, 61, 63, 53…
## $ temperature_2m_max <dbl> 1.6, 4.9, -0.6, -5.9, -6.1, 1.0, 4.9, 1.6, …
## $ temperature_2m_min <dbl> -3.5, -0.9, -7.5, -11.8, -11.2, -10.3, -3.2…
## $ apparent_temperature_max <dbl> -4.3, -0.6, -6.9, -13.1, -13.6, -4.4, -1.3,…
## $ apparent_temperature_min <dbl> -7.9, -7.0, -14.4, -18.4, -17.7, -17.9, -8.…
## $ precipitation_sum <dbl> 0.0, 1.7, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.6…
## $ rain_sum <dbl> 0.0, 1.7, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.6…
## $ snowfall_sum <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
## $ precipitation_hours <dbl> 0, 11, 0, 0, 0, 0, 0, 0, 6, 1, 5, 24, 5, 10…
## $ sunrise <chr> "1960-01-01T07:18", "1960-01-02T07:18", "19…
## $ sunset <chr> "1960-01-01T16:29", "1960-01-02T16:30", "19…
## $ windspeed_10m_max <dbl> 22.7, 27.6, 27.0, 29.1, 29.6, 32.1, 32.7, 3…
## $ windgusts_10m_max <dbl> 40.0, 52.6, 44.6, 50.8, 48.2, 52.6, 56.5, 5…
## $ winddirection_10m_dominant <int> 142, 214, 268, 247, 261, 232, 234, 275, 185…
## $ shortwave_radiation_sum <dbl> 7.45, 2.25, 4.58, 8.66, 9.09, 8.79, 5.86, 8…
## $ et0_fao_evapotranspiration <dbl> 0.95, 0.66, 1.06, 1.06, 1.04, 1.33, 1.23, 1…
Variables are converted to proper data type:
dfDailyORD <- ordOMDaily$tblDaily %>%
mutate(weathercode=factor(weathercode),
sunrise_chr=sunrise,
sunset_chr=sunset,
sunrise=lubridate::ymd_hm(sunrise),
sunset=lubridate::ymd_hm(sunset),
fct_winddir=factor(winddirection_10m_dominant)
)
glimpse(dfDailyORD)
## Rows: 23,376
## Columns: 21
## $ date <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode <fct> 3, 51, 3, 2, 3, 1, 3, 3, 51, 51, 61, 63, 53…
## $ temperature_2m_max <dbl> 1.6, 4.9, -0.6, -5.9, -6.1, 1.0, 4.9, 1.6, …
## $ temperature_2m_min <dbl> -3.5, -0.9, -7.5, -11.8, -11.2, -10.3, -3.2…
## $ apparent_temperature_max <dbl> -4.3, -0.6, -6.9, -13.1, -13.6, -4.4, -1.3,…
## $ apparent_temperature_min <dbl> -7.9, -7.0, -14.4, -18.4, -17.7, -17.9, -8.…
## $ precipitation_sum <dbl> 0.0, 1.7, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.6…
## $ rain_sum <dbl> 0.0, 1.7, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.6…
## $ snowfall_sum <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
## $ precipitation_hours <dbl> 0, 11, 0, 0, 0, 0, 0, 0, 6, 1, 5, 24, 5, 10…
## $ sunrise <dttm> 1960-01-01 07:18:00, 1960-01-02 07:18:00, …
## $ sunset <dttm> 1960-01-01 16:29:00, 1960-01-02 16:30:00, …
## $ windspeed_10m_max <dbl> 22.7, 27.6, 27.0, 29.1, 29.6, 32.1, 32.7, 3…
## $ windgusts_10m_max <dbl> 40.0, 52.6, 44.6, 50.8, 48.2, 52.6, 56.5, 5…
## $ winddirection_10m_dominant <int> 142, 214, 268, 247, 261, 232, 234, 275, 185…
## $ shortwave_radiation_sum <dbl> 7.45, 2.25, 4.58, 8.66, 9.09, 8.79, 5.86, 8…
## $ et0_fao_evapotranspiration <dbl> 0.95, 0.66, 1.06, 1.06, 1.04, 1.33, 1.23, 1…
## $ sunrise_chr <chr> "1960-01-01T07:18", "1960-01-02T07:18", "19…
## $ sunset_chr <chr> "1960-01-01T16:29", "1960-01-02T16:30", "19…
## $ fct_winddir <fct> 142, 214, 268, 247, 261, 232, 234, 275, 185…
Averages by month for select continuous variables are plotted:
tmpMapNames <- c("precipitation_sum"="3. Precipitation (mm)",
"windspeed_10m_max"="4. Windspeed (kph)",
"temperature_2m_max"="1. High Temperature (C)",
"temperature_2m_min"="2. Low Temperature (C)"
)
dfDailyORD %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(year, month, temperature_2m_max, temperature_2m_min, precipitation_sum, windspeed_10m_max) %>%
group_by(year, month) %>%
summarize(across(-c(precipitation_sum), .fns=mean),
across(c(precipitation_sum), .fns=sum),
.groups="drop"
) %>%
group_by(month) %>%
summarize(across(-c(year), .fns=mean)) %>%
pivot_longer(-c(month)) %>%
ggplot(aes(x=month, y=value)) +
geom_line(aes(group=1)) +
facet_wrap(~tmpMapNames[name], scales="free_y") +
labs(x=NULL, y=NULL, title="Monthly averages for key metrics (Chicago 1960-2023)")
Averages by month for select categorical variables are plotted:
tmpMapNames <- c("wc"="1. Weather Type",
"winddir"="2. Predominant Wind Direction"
)
tmpDFPlot <- dfDailyORD %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb),
year=year(date),
winddir=case_when(winddirection_10m_dominant>360~"Invalid",
winddirection_10m_dominant>=315~"1. N",
winddirection_10m_dominant>=225~"2. W",
winddirection_10m_dominant>=135~"3. S",
winddirection_10m_dominant>=45~"4. E",
winddirection_10m_dominant>=0~"1. N",
TRUE~"Invalid"
),
wc=case_when(weathercode==0~"1. Clear",
weathercode %in% c(1, 2, 3)~"2. Dry",
weathercode %in% c(51, 53, 55)~"3. Drizzle",
weathercode %in% c(61, 63, 65)~"4. Rain",
weathercode %in% c(71, 73, 75)~"5. Snow",
TRUE~"Error"
)
) %>%
select(month, wc, winddir) %>%
pivot_longer(-c(month)) %>%
count(month, name, value)
tmpPlotFN <- function(x) {
p1 <- tmpDFPlot %>%
filter(name==x) %>%
ggplot(aes(x=month, y=n)) +
geom_line(aes(group=value, color=value), lwd=2) +
labs(x=NULL,
y=NULL,
title=paste0(tmpMapNames[x], " (Chicago 1960-2023)")
) +
scale_color_discrete(NULL) +
theme(legend.position = "bottom")
return(p1)
}
gridExtra::grid.arrange(tmpPlotFN("wc"), tmpPlotFN("winddir"), nrow=1)
Averages by year for select continuous variables are plotted:
tmpMapNames <- c("precipitation_sum"="3. Precipitation (mm)",
"windspeed_10m_max"="4. Windspeed (kph)",
"temperature_2m_max"="1. High Temperature (C)",
"temperature_2m_min"="2. Low Temperature (C)"
)
dfDailyORD %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(year, temperature_2m_max, temperature_2m_min, precipitation_sum, windspeed_10m_max) %>%
group_by(year) %>%
summarize(across(-c(precipitation_sum), .fns=mean),
across(c(precipitation_sum), .fns=sum),
.groups="drop"
) %>%
pivot_longer(-c(year)) %>%
ggplot(aes(x=year, y=value)) +
geom_line(aes(group=1)) +
facet_wrap(~tmpMapNames[name], scales="free_y") +
labs(x=NULL, y=NULL, title="Annual averages for key metrics (Chicago 1960-2023)")
Averages by year for select categorical variables are plotted:
tmpMapNames <- c("wc"="1. Weather Type",
"winddir"="2. Predominant Wind Direction"
)
tmpDFPlot <- dfDailyORD %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb),
year=year(date),
winddir=case_when(winddirection_10m_dominant>360~"Invalid",
winddirection_10m_dominant>=315~"1. N",
winddirection_10m_dominant>=225~"2. W",
winddirection_10m_dominant>=135~"3. S",
winddirection_10m_dominant>=45~"4. E",
winddirection_10m_dominant>=0~"1. N",
TRUE~"Invalid"
),
wc=case_when(weathercode==0~"1. Clear",
weathercode %in% c(1, 2, 3)~"2. Dry",
weathercode %in% c(51, 53, 55)~"3. Drizzle",
weathercode %in% c(61, 63, 65)~"4. Rain",
weathercode %in% c(71, 73, 75)~"5. Snow",
TRUE~"Error"
)
) %>%
select(year, wc, winddir) %>%
pivot_longer(-c(year)) %>%
count(year, name, value)
tmpPlotFN <- function(x) {
p1 <- tmpDFPlot %>%
filter(name==x) %>%
ggplot(aes(x=year, y=n)) +
geom_line(aes(group=value, color=value), lwd=1) +
labs(x=NULL,
y=NULL,
title=paste0(tmpMapNames[x], " (Chicago 1960-2023)")
) +
scale_color_discrete(NULL) +
theme(legend.position = "bottom")
return(p1)
}
gridExtra::grid.arrange(tmpPlotFN("wc"), tmpPlotFN("winddir"), nrow=1)
Boxplots for maximum windspeed are created:
dfDailyORD %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(month, windspeed_10m_max) %>%
ggplot(aes(x=month, y=windspeed_10m_max)) +
geom_boxplot(fill="lightblue") +
lims(y=c(0, NA)) +
labs(x=NULL,
y="Maximum daily wind speed (kph)",
title="Maximum daily windspeed by month (Chicago 1960-2023)"
)
Boxplots for precipitation are created:
dfDailyORD %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(month, precipitation_hours) %>%
ggplot(aes(x=month, y=precipitation_hours)) +
geom_boxplot(fill="lightblue") +
lims(y=c(0, NA)) +
labs(x=NULL,
y="Daily precipitation hours",
title="Daily precipitation hours by month (Chicago 1960-2023)"
)
dfDailyORD %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(month, precipitation_sum) %>%
ggplot(aes(x=month, y=precipitation_sum)) +
geom_boxplot(fill="lightblue") +
lims(y=c(0, NA)) +
labs(x=NULL,
y="Daily precipitation (mm)",
title="Daily precipitation by month (Chicago 1960-2023)"
)
Boxplots for snow are also created:
dfDailyORD %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(month, snowfall_sum) %>%
ggplot(aes(x=month, y=snowfall_sum)) +
geom_boxplot(fill="lightblue") +
lims(y=c(0, NA)) +
labs(x=NULL,
y="Daily snowfall (liquid equivalent mm)",
title="Daily snowfall by month (Chicago 1960-2023)"
)
Boxplots for temperature are created:
dfDailyORD %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(month, temperature_2m_max) %>%
ggplot(aes(x=month, y=temperature_2m_max)) +
geom_boxplot(fill="lightblue") +
labs(x=NULL,
y="Daily high temperature (C)",
title="Daily high temperature by month (Chicago 1960-2023)"
)
dfDailyORD %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(month, temperature_2m_min) %>%
ggplot(aes(x=month, y=temperature_2m_min)) +
geom_boxplot(fill="lightblue") +
labs(x=NULL,
y="Daily low temperature (C)",
title="Daily low temperature by month (Chicago 1960-2023)"
)
ACF and PACF are explored for maximum temperature:
acfTemp <- acf(dfDailyORD$temperature_2m_max, lag.max=1000)
as.vector(acfTemp$acf) %>% findPeaks(width=21) %>% which()
## [1] 366 728
as.vector(acfTemp$acf) %>% findPeaks(width=21, FUN=min) %>% which()
## [1] 180 551 915
pacfTemp <- pacf(dfDailyORD$temperature_2m_max, lag.max=1000)
pacf(dfDailyORD$temperature_2m_max, lag.max=50)
As expected, ACF has a sustained seasonal pattern, with peaks (at roughly intervals of 365) and valleys (at roughly intervals of 365 offset by roughly 185) corresponding to the 365-day year
ACF and PACF are explored for precipitation:
acfPrecip <- acf(dfDailyORD$precipitation_sum, lag.max=1000)
acf(dfDailyORD$precipitation_sum, lag.max=50)
as.vector(acfPrecip$acf) %>% findPeaks(width=21) %>% which()
## [1] 42 80 94 122 167 179 205 216 229 251 268 288 302 332 350 388 413 443 454
## [20] 469 492 528 570 599 612 633 654 690 730 758 770 789 808 820 845 865 881 915
## [39] 926 940 951 976
as.vector(acfPrecip$acf) %>% findPeaks(width=21, FUN=min) %>% which()
## [1] 24 49 67 86 114 137 163 181 192 218 231 248 260 281 311 324 340 355 375
## [20] 400 430 451 462 482 521 535 549 560 583 597 616 631 662 687 713 736 757 776
## [39] 795 822 853 875 889 934 949 964 979
pacfPrecip <- pacf(dfDailyORD$precipitation_sum, lag.max=1000)
pacf(dfDailyORD$precipitation_sum, lag.max=50)
Precipitation by contrast has little seasonality and mostly just a slight correlation to the previous day’s value
ACF and PACF are explored for wind speed:
acfWind <- acf(dfDailyORD$windspeed_10m_max, lag.max=1000)
acf(dfDailyORD$windspeed_10m_max, lag.max=50)
as.vector(acfWind$acf) %>% findPeaks(width=21) %>% which()
## [1] 149 167 182 207 228 239 353 373 408 476 523 538 554 572 594 620 722 737 852
## [20] 883 901 926 953
as.vector(acfWind$acf) %>% findPeaks(width=21, FUN=min) %>% which()
## [1] 112 126 137 154 169 190 217 253 278 376 469 483 506 518 556 589 633 725 739
## [20] 753 847 862 879 916 939 959 976
pacfWind <- pacf(dfDailyORD$windspeed_10m_max, lag.max=1000)
pacf(dfDailyORD$windspeed_10m_max, lag.max=50)
Similar to temperature, wind speed appears to have a sustained seasonal component, though peak correlations are much lower in magnitude (~0.1 for wind speed vs. ~0.8 for temperature)
A boxplot for delta daily temperature is created:
dfDailyORD %>%
mutate(month=factor(month.abb[month(date)], levels=month.abb),
chg=temperature_2m_max-lag(temperature_2m_max)
) %>%
filter(!is.na(chg)) %>%
ggplot(aes(x=month, y=chg)) +
geom_boxplot() +
labs(x=NULL, y="Daily change in maximum temperature")
Daily temperature changes are generally larger in magnitude during the colder season
A boxplot for delta daily wind speed is created:
dfDailyORD %>%
mutate(month=factor(month.abb[month(date)], levels=month.abb),
chg=windspeed_10m_max-lag(windspeed_10m_max)
) %>%
filter(!is.na(chg)) %>%
ggplot(aes(x=month, y=chg)) +
geom_boxplot() +
labs(x=NULL, y="Daily change in maximum wind speed")
Daily wind speed changes are generally similar in magnitude by season
A boxplot for delta daily precipitation is created:
dfDailyORD %>%
mutate(month=factor(month.abb[month(date)], levels=month.abb),
chg=precipitation_sum-lag(precipitation_sum)
) %>%
filter(!is.na(chg)) %>%
ggplot(aes(x=month, y=chg)) +
geom_boxplot() +
labs(x=NULL, y="Daily change in precipitation")
Most days have no precipitation. The corresponding boxplot for precipitation change has small boxes and whiskers with many outliers
Averages for temperature, precipitation, and wind speed are calculated by day of year, using a 21-day rolling window:
df_r21 <- dfDailyORD %>%
mutate(doy=pmin(yday(date), 365)) %>%
helperRollingAgg(origVar="temperature_2m_max", newName="temp_r21", k=21) %>%
helperRollingAgg(origVar="windspeed_10m_max", newName="wind_r21", k=21) %>%
helperRollingAgg(origVar="precipitation_sum", newName="precip_r21", k=21) %>%
select(date, doy, contains("_r21"))
df_r21 %>%
na.omit() %>%
group_by(doy) %>%
summarize(across(where(is.numeric), .fns=mean), n=n()) %>%
pivot_longer(cols=-c(doy)) %>%
ggplot(aes(x=doy, y=value)) +
geom_line(aes(group=name, color=name)) +
facet_wrap(~name, scales="free_y") +
labs(x="Day of Year", y="Rolling-21 mean") +
theme(legend.position="none")
Outliers are likely much more important in driving average precipitation than in driving average temperature
Standard deviations for temperature, precipitation, and wind speed are calculated by day of year, using a 21-day rolling window:
df_r21_sd <- dfDailyORD %>%
mutate(doy=pmin(yday(date), 365)) %>%
group_by(doy) %>%
mutate(across(.cols=c(temperature_2m_max, windspeed_10m_max, precipitation_sum), .fns=sd)) %>%
ungroup() %>%
helperRollingAgg(origVar="temperature_2m_max", newName="temp_r21", k=21) %>%
helperRollingAgg(origVar="windspeed_10m_max", newName="wind_r21", k=21) %>%
helperRollingAgg(origVar="precipitation_sum", newName="precip_r21", k=21) %>%
select(date, doy, contains("_r21"))
df_r21_sd %>%
na.omit() %>%
group_by(doy) %>%
summarize(across(where(is.numeric), .fns=mean)) %>%
pivot_longer(cols=-c(doy)) %>%
ggplot(aes(x=doy, y=value)) +
geom_line(aes(group=name, color=name)) +
facet_wrap(~name, scales="free_y") +
labs(x="Day of Year", y="Rolling-21 mean of daily standard deviation") +
theme(legend.position="none")
Means and standard deviations are plotted together:
df_r21 %>%
bind_rows(df_r21_sd, .id="src") %>%
na.omit() %>%
mutate(musig=c("1"="Mean", "2"="SD")[src]) %>%
group_by(doy, musig) %>%
summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
pivot_longer(cols=-c(doy, musig)) %>%
pivot_wider(id_cols=c(doy, name), names_from="musig") %>%
ggplot(aes(x=doy)) +
geom_line(aes(y=Mean, group=name, color=name), lwd=2) +
geom_ribbon(aes(ymin=Mean-SD, ymax=Mean+SD, fill=name), alpha=0.5) +
facet_wrap(~name, scales="free_y") +
labs(x="Day of Year",
y="Rolling-21 mean +/- daily standard deviation",
title="Rolling 21-day mean +/- one rolling 21-day sd"
) +
theme(legend.position="none")
Means and approximate SEM are plotted together:
nYears <- length(unique(year(df_r21$date)))
nYears
## [1] 64
df_r21 %>%
bind_rows(df_r21_sd, .id="src") %>%
na.omit() %>%
mutate(musig=c("1"="Mean", "2"="SD")[src]) %>%
group_by(doy, musig) %>%
summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
pivot_longer(cols=-c(doy, musig)) %>%
pivot_wider(id_cols=c(doy, name), names_from="musig") %>%
ggplot(aes(x=doy)) +
geom_line(aes(y=Mean, group=name, color=name), lwd=2) +
geom_ribbon(aes(ymin=Mean-SD/sqrt(nYears-1), ymax=Mean+SD/sqrt(nYears-1), fill=name), alpha=0.5) +
facet_wrap(~name, scales="free_y") +
labs(x="Day of Year",
y="Rolling-21 mean +/- 1 SEM (approx)",
title="Rolling 21-day mean +/- 1 SEM (approx)"
) +
theme(legend.position="none")
Consistent with previous analyses, temperature has a strong seasonal pattern with differences in mean that vastly exceed the standard error of the mean. By contrast, while there is apparent seasonal spikiness to precipitation, rolling 21-day means are usually within one standard error of the mean of each other. Wind is more similar to temperature in having seasonal variations that meaningfully exceed SEM in magnitude.
Percentage of cumulative precipitation by volume of daily precipitation is explored:
tmpPlotData <- dfDailyORD %>%
select(date, precipitation_sum) %>%
group_by(precipitation_sum) %>%
summarize(n=n(), precip=sum(precipitation_sum), .groups="drop") %>%
mutate(csp=cumsum(precip), cpp=csp/sum(precip), csn=cumsum(n), cpn=csn/sum(n))
tmpPlotData %>%
pivot_longer(cols=-c(precipitation_sum)) %>%
ggplot(aes(x=precipitation_sum, y=value)) +
geom_line(data=~filter(., name %in% c("cpp", "cpn")),
aes(group=name, color=c("cpn"="# events", "cpp"="precip")[name])
) +
labs(x="Daily precipitation (mm)", y="% total (cumul)") +
scale_color_discrete("Metric")
Around half of days have no precipitation. Around 25% of total precipitation comes from the rare day with 20+ mm (nearly an inch) of precipitation
Total precipitation by decade is also explored, sorted by daily precipitation:
dfDailyORD %>%
select(date, precipitation_sum) %>%
mutate(year=year(date), decade=round(year(date)-4.5, -1)) %>%
group_by(decade, precipitation_sum) %>%
summarize(n=n(), precip=sum(precipitation_sum), .groups="drop") %>%
group_by(decade) %>%
mutate(csp=cumsum(precip), ntot=sum(n), meancsp=365*csp/ntot) %>%
ungroup() %>%
ggplot(aes(x=precipitation_sum, y=meancsp)) +
geom_line(aes(group=factor(decade), color=factor(decade)), lwd=1) +
labs(x="Daily precipitation (mm)",
y="Cumulative annual precipitation (mm)",
title="Average annual precipitation by decade",
subtitle="Cumulative, for daily precipitation amounts LTE x-axis"
) +
scale_color_discrete("Decade")
Trends look similar by decade for precipitation amounts under 0.5 inch (12.5 mm). Heavier precipitation amounts appear to be more prevalent in the 2010s/2020s and lightest in the 1960s, driving differences in annual precipitation
Averages for temperature, precipitation, and wind speed are calculated by day of year, using a 21-day rolling window, for Atlanta, Detroit, and Chicago:
df_r21 <- bind_rows(dfDailyATL, dfDailyORD, dfDailyDTW, .id="src") %>%
mutate(doy=pmin(yday(date), 365),
src=c("1"="Atlanta", "2"="Chicago", "3"="Detroit")[src]
) %>%
group_by(src) %>%
helperRollingAgg(origVar="temperature_2m_max", newName="temp_r21", k=21) %>%
helperRollingAgg(origVar="windspeed_10m_max", newName="wind_r21", k=21) %>%
helperRollingAgg(origVar="precipitation_sum", newName="precip_r21", k=21) %>%
select(src, date, doy, contains("_r21")) %>%
ungroup()
df_r21 %>%
na.omit() %>%
group_by(src, doy) %>%
summarize(across(where(is.numeric), .fns=mean), n=n(), .groups="drop") %>%
pivot_longer(cols=-c(src, doy)) %>%
ggplot(aes(x=doy, y=value)) +
geom_line(aes(group=src, color=src), lwd=1) +
facet_wrap(~name, scales="free_y") +
labs(x="Day of Year", y="Rolling-21 mean") +
theme(legend.position="bottom") +
scale_color_discrete(NULL)
Standard deviations for temperature, precipitation, and wind speed are calculated by day of year, using a 21-day rolling window:
df_r21_sd <- bind_rows(dfDailyATL, dfDailyORD, dfDailyDTW, .id="src") %>%
mutate(doy=pmin(yday(date), 365),
src=c("1"="Atlanta", "2"="Chicago", "3"="Detroit")[src]
) %>%
group_by(src, doy) %>%
mutate(across(.cols=c(temperature_2m_max, windspeed_10m_max, precipitation_sum), .fns=sd)) %>%
group_by(src) %>%
helperRollingAgg(origVar="temperature_2m_max", newName="temp_r21", k=21) %>%
helperRollingAgg(origVar="windspeed_10m_max", newName="wind_r21", k=21) %>%
helperRollingAgg(origVar="precipitation_sum", newName="precip_r21", k=21) %>%
select(src, date, doy, contains("_r21")) %>%
ungroup()
df_r21_sd %>%
na.omit() %>%
group_by(src, doy) %>%
summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
pivot_longer(cols=-c(src, doy)) %>%
ggplot(aes(x=doy, y=value)) +
geom_line(aes(group=src, color=src), lwd=1) +
facet_wrap(~name, scales="free_y") +
labs(x="Day of Year", y="Rolling-21 mean of daily standard deviation") +
theme(legend.position="bottom")
Means and approximate SEM are plotted together:
nYears <- length(unique(year(df_r21$date)))
nYears
## [1] 64
df_r21 %>%
bind_rows(df_r21_sd, .id="typ") %>%
na.omit() %>%
mutate(musig=c("1"="Mean", "2"="SD")[typ]) %>%
group_by(src, doy, musig) %>%
summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
pivot_longer(cols=-c(src, doy, musig)) %>%
pivot_wider(id_cols=c(src, doy, name), names_from="musig") %>%
ggplot(aes(x=doy)) +
geom_line(aes(y=Mean, group=src, color=src), lty=2) +
geom_ribbon(aes(ymin=Mean-SD/sqrt(nYears-1), ymax=Mean+SD/sqrt(nYears-1), fill=src), alpha=0.5) +
facet_wrap(~name, scales="free_y") +
labs(x="Day of Year",
y="Rolling-21 mean +/- 1 SEM (approx)",
title="Rolling 21-day mean +/- 1 SEM (approx)"
) +
theme(legend.position="bottom")
Percentage of cumulative precipitation by volume of daily precipitation is explored, for three cities:
tmpPlotData <- bind_rows(dfDailyATL, dfDailyORD, dfDailyDTW, .id="src") %>%
mutate(src=c("1"="Atlanta", "2"="Chicago", "3"="Detroit")[src]) %>%
select(src, date, precipitation_sum) %>%
group_by(src, precipitation_sum) %>%
summarize(n=n(), precip=sum(precipitation_sum), .groups="drop") %>%
group_by(src) %>%
mutate(csp=cumsum(precip), cpp=csp/sum(precip), csn=cumsum(n), cpn=csn/sum(n)) %>%
ungroup()
tmpPlotData %>%
pivot_longer(cols=-c(src, precipitation_sum)) %>%
filter(name %in% c("cpp", "cpn")) %>%
ggplot(aes(x=precipitation_sum, y=value)) +
geom_line(aes(group=src, color=src)) +
labs(x="Daily precipitation (mm)", y="% total (cumul)") +
scale_color_discrete("Metric") +
facet_wrap(~c("cpn"="# events", "cpp"="precip")[name])
Around half of days have no precipitation. In all three cities, around 25% of total precipitation comes from the rare day with 15-20+ mm (half inch to nearly an inch) of precipitation
Frequency of precipitation by change in temperature is explored:
bind_rows(dfDailyATL, dfDailyORD, dfDailyDTW, .id="src") %>%
mutate(src=c("1"="Atlanta", "2"="Chicago", "3"="Detroit")[src]) %>%
select(src, date, precipitation_sum, temperature_2m_max) %>%
group_by(src) %>%
mutate(isPrecip=precipitation_sum>0,
dTemp=temperature_2m_max-lag(temperature_2m_max),
month=factor(month.abb[month(date)], levels=month.abb)
) %>%
ungroup() %>%
filter(complete.cases(.)) %>%
mutate(dTempRound=autoRound(dTemp, rndTo=2.5), dTempRound=pmax(-10, pmin(10, dTempRound))) %>%
group_by(src, dTempRound) %>%
summarize(pctPrecip=mean(isPrecip), muPrecip=mean(precipitation_sum), n=n(), .groups="drop") %>%
mutate(sdpctPrecip=sqrt(pctPrecip*(1-pctPrecip)/n)) %>%
ggplot(aes(x=dTempRound)) +
geom_line(aes(y=pctPrecip, color=src, group=src)) +
geom_ribbon(aes(ymin=pctPrecip-sdpctPrecip, ymax=pctPrecip+sdpctPrecip, fill=src, group=src), alpha=0.5) +
labs(x="Temperature Change from Yesterday (C)\n(rounded to nearest 2.5, capped at +/- 10)",
y="% Days with Precipitation > 0\n(mean +/- 1 SE)",
title="Frequency of precipitation by change in temperature"
) +
lims(y=c(0, 1)) +
scale_fill_discrete(NULL) +
scale_color_discrete(guide="none")
Month is a possible confounder, since temperature change and precipitation both vary by season:
bind_rows(dfDailyATL, dfDailyORD, dfDailyDTW, .id="src") %>%
mutate(src=c("1"="Atlanta", "2"="Chicago", "3"="Detroit")[src]) %>%
select(src, date, precipitation_sum, temperature_2m_max) %>%
group_by(src) %>%
mutate(isPrecip=precipitation_sum>0,
dTemp=temperature_2m_max-lag(temperature_2m_max),
month=factor(month.abb[month(date)], levels=month.abb)
) %>%
ungroup() %>%
filter(complete.cases(.)) %>%
mutate(dTempRound=autoRound(dTemp, rndTo=2.5), dTempRound=pmax(-10, pmin(10, dTempRound))) %>%
group_by(src, month, dTempRound) %>%
summarize(pctPrecip=mean(isPrecip), muPrecip=mean(precipitation_sum), n=n(), .groups="drop") %>%
mutate(sdpctPrecip=sqrt(pctPrecip*(1-pctPrecip)/n)) %>%
filter(n>=5) %>%
ggplot(aes(x=dTempRound)) +
geom_line(aes(y=pctPrecip, color=src, group=src)) +
geom_ribbon(aes(ymin=pctPrecip-sdpctPrecip, ymax=pctPrecip+sdpctPrecip, fill=src, group=src), alpha=0.5) +
facet_wrap(~month) +
labs(x="Temperature Change from Yesterday (C)\n(rounded to nearest 2.5, capped at +/- 10)",
y="% Days with Precipitation > 0\n(mean +/- 1 SE)",
title="Frequency of precipitation by change in temperature"
) +
lims(y=c(0, 1)) +
scale_fill_discrete(NULL) +
scale_color_discrete(guide="none")
At a glance, there appears to be some association between temperature change and frequency of precipitation, even after controlling for city and month
Long term daily data is downloaded for New Orleans:
# Daily data download for New Orleans, LA
testURLDaily <- helperOpenMeteoURL(cityName="New Orleans LA",
dailyIndices=1:nrow(tblMetricsDaily),
startDate="1960-01-01",
endDate="2023-12-31",
tz="US/Central"
)
##
## Daily metrics created from indices: weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration
testURLDaily
## [1] "https://archive-api.open-meteo.com/v1/archive?latitude=30.07&longitude=-89.93&start_date=1960-01-01&end_date=2023-12-31&daily=weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration&timezone=US%2FCentral"
# Download file
if(!file.exists("testOM_daily_msy.json")) {
fileDownload(fileName="testOM_daily_msy.json", url=testURLDaily)
} else {
cat("\nFile testOM_daily_msy.json already exists, skipping download\n")
}
##
## File testOM_daily_msy.json already exists, skipping download
The daily dataset is loaded:
# Read daily JSON file
msyOMDaily <- formatOpenMeteoJSON("testOM_daily_msy.json")
##
## Objects in JSON include: latitude, longitude, generationtime_ms, utc_offset_seconds, timezone, timezone_abbreviation, elevation, daily_units, daily
##
## $tblDaily
## # A tibble: 23,376 × 18
## date time weathercode temperature_2m_max temperature_2m_min
## <date> <chr> <int> <dbl> <dbl>
## 1 1960-01-01 1960-01-01 61 15.5 10.9
## 2 1960-01-02 1960-01-02 63 19.1 12.3
## 3 1960-01-03 1960-01-03 51 17.2 8.6
## 4 1960-01-04 1960-01-04 3 12.1 6.4
## 5 1960-01-05 1960-01-05 51 17.9 11.3
## 6 1960-01-06 1960-01-06 63 18.2 10.3
## 7 1960-01-07 1960-01-07 51 10.3 6.1
## 8 1960-01-08 1960-01-08 3 11.7 4.6
## 9 1960-01-09 1960-01-09 3 15.8 7.5
## 10 1960-01-10 1960-01-10 51 19.1 12.8
## # ℹ 23,366 more rows
## # ℹ 13 more variables: apparent_temperature_max <dbl>,
## # apparent_temperature_min <dbl>, precipitation_sum <dbl>, rain_sum <dbl>,
## # snowfall_sum <dbl>, precipitation_hours <dbl>, sunrise <chr>, sunset <chr>,
## # windspeed_10m_max <dbl>, windgusts_10m_max <dbl>,
## # winddirection_10m_dominant <int>, shortwave_radiation_sum <dbl>,
## # et0_fao_evapotranspiration <dbl>
##
## $tblHourly
## NULL
##
## $tblUnits
## # A tibble: 17 × 4
## metricType name value description
## <chr> <chr> <chr> <chr>
## 1 daily_units time "iso8601" <NA>
## 2 daily_units weathercode "wmo code" The most severe weather co…
## 3 daily_units temperature_2m_max "deg C" Maximum and minimum daily …
## 4 daily_units temperature_2m_min "deg C" Maximum and minimum daily …
## 5 daily_units apparent_temperature_max "deg C" Maximum and minimum daily …
## 6 daily_units apparent_temperature_min "deg C" Maximum and minimum daily …
## 7 daily_units precipitation_sum "mm" Sum of daily precipitation…
## 8 daily_units rain_sum "mm" Sum of daily rain
## 9 daily_units snowfall_sum "cm" Sum of daily snowfall
## 10 daily_units precipitation_hours "h" The number of hours with r…
## 11 daily_units sunrise "iso8601" Sun rise and set times
## 12 daily_units sunset "iso8601" Sun rise and set times
## 13 daily_units windspeed_10m_max "km/h" Maximum wind speed and gus…
## 14 daily_units windgusts_10m_max "km/h" Maximum wind speed and gus…
## 15 daily_units winddirection_10m_dominant "deg " Dominant wind direction
## 16 daily_units shortwave_radiation_sum "MJ/m²" The sum of solar radiaion …
## 17 daily_units et0_fao_evapotranspiration "mm" Daily sum of ET0 Reference…
##
## $tblDescription
## # A tibble: 1 × 7
## latitude longitude generationtime_ms utc_offset_seconds timezone
## <dbl> <dbl> <dbl> <int> <chr>
## 1 30.1 -89.9 7165. -18000 US/Central
## # ℹ 2 more variables: timezone_abbreviation <chr>, elevation <dbl>
##
##
## latitude: 30.05272
## longitude: -89.89499
## generationtime_ms: 7164.689
## utc_offset_seconds: -18000
## timezone: US/Central
## timezone_abbreviation: GMT-5
## elevation: 0
# Sample records of tibble
msyOMDaily$tblDaily %>% glimpse()
## Rows: 23,376
## Columns: 18
## $ date <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode <int> 61, 63, 51, 3, 51, 63, 51, 3, 3, 51, 3, 3, …
## $ temperature_2m_max <dbl> 15.5, 19.1, 17.2, 12.1, 17.9, 18.2, 10.3, 1…
## $ temperature_2m_min <dbl> 10.9, 12.3, 8.6, 6.4, 11.3, 10.3, 6.1, 4.6,…
## $ apparent_temperature_max <dbl> 13.6, 20.8, 15.8, 9.2, 20.1, 20.2, 7.7, 9.1…
## $ apparent_temperature_min <dbl> 6.6, 9.8, 4.1, 2.7, 8.0, 7.3, 2.5, 1.7, 4.8…
## $ precipitation_sum <dbl> 7.1, 18.5, 0.3, 0.0, 0.7, 26.0, 0.1, 0.0, 0…
## $ rain_sum <dbl> 7.1, 18.5, 0.3, 0.0, 0.7, 26.0, 0.1, 0.0, 0…
## $ snowfall_sum <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ precipitation_hours <dbl> 12, 12, 2, 0, 7, 19, 1, 0, 0, 1, 0, 0, 0, 8…
## $ sunrise <chr> "1960-01-01T07:55", "1960-01-02T07:55", "19…
## $ sunset <chr> "1960-01-01T18:10", "1960-01-02T18:11", "19…
## $ windspeed_10m_max <dbl> 28.9, 18.8, 29.8, 16.2, 21.0, 19.7, 22.7, 1…
## $ windgusts_10m_max <dbl> 53.6, 36.4, 50.0, 27.7, 34.6, 59.4, 39.2, 2…
## $ winddirection_10m_dominant <int> 69, 125, 345, 40, 81, 352, 324, 66, 87, 106…
## $ shortwave_radiation_sum <dbl> 7.86, 4.91, 13.14, 13.63, 4.33, 1.05, 10.67…
## $ et0_fao_evapotranspiration <dbl> 1.05, 0.79, 2.22, 2.17, 1.14, 0.28, 1.30, 1…
Variables are converted to proper data type:
dfDailyMSY <- msyOMDaily$tblDaily %>%
mutate(weathercode=factor(weathercode),
sunrise_chr=sunrise,
sunset_chr=sunset,
sunrise=lubridate::ymd_hm(sunrise),
sunset=lubridate::ymd_hm(sunset),
fct_winddir=factor(winddirection_10m_dominant)
)
glimpse(dfDailyMSY)
## Rows: 23,376
## Columns: 21
## $ date <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode <fct> 61, 63, 51, 3, 51, 63, 51, 3, 3, 51, 3, 3, …
## $ temperature_2m_max <dbl> 15.5, 19.1, 17.2, 12.1, 17.9, 18.2, 10.3, 1…
## $ temperature_2m_min <dbl> 10.9, 12.3, 8.6, 6.4, 11.3, 10.3, 6.1, 4.6,…
## $ apparent_temperature_max <dbl> 13.6, 20.8, 15.8, 9.2, 20.1, 20.2, 7.7, 9.1…
## $ apparent_temperature_min <dbl> 6.6, 9.8, 4.1, 2.7, 8.0, 7.3, 2.5, 1.7, 4.8…
## $ precipitation_sum <dbl> 7.1, 18.5, 0.3, 0.0, 0.7, 26.0, 0.1, 0.0, 0…
## $ rain_sum <dbl> 7.1, 18.5, 0.3, 0.0, 0.7, 26.0, 0.1, 0.0, 0…
## $ snowfall_sum <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ precipitation_hours <dbl> 12, 12, 2, 0, 7, 19, 1, 0, 0, 1, 0, 0, 0, 8…
## $ sunrise <dttm> 1960-01-01 07:55:00, 1960-01-02 07:55:00, …
## $ sunset <dttm> 1960-01-01 18:10:00, 1960-01-02 18:11:00, …
## $ windspeed_10m_max <dbl> 28.9, 18.8, 29.8, 16.2, 21.0, 19.7, 22.7, 1…
## $ windgusts_10m_max <dbl> 53.6, 36.4, 50.0, 27.7, 34.6, 59.4, 39.2, 2…
## $ winddirection_10m_dominant <int> 69, 125, 345, 40, 81, 352, 324, 66, 87, 106…
## $ shortwave_radiation_sum <dbl> 7.86, 4.91, 13.14, 13.63, 4.33, 1.05, 10.67…
## $ et0_fao_evapotranspiration <dbl> 1.05, 0.79, 2.22, 2.17, 1.14, 0.28, 1.30, 1…
## $ sunrise_chr <chr> "1960-01-01T07:55", "1960-01-02T07:55", "19…
## $ sunset_chr <chr> "1960-01-01T18:10", "1960-01-02T18:11", "19…
## $ fct_winddir <fct> 69, 125, 345, 40, 81, 352, 324, 66, 87, 106…
Averages by month for select continuous variables are plotted:
tmpMapNames <- c("precipitation_sum"="3. Precipitation (mm)",
"windspeed_10m_max"="4. Windspeed (kph)",
"temperature_2m_max"="1. High Temperature (C)",
"temperature_2m_min"="2. Low Temperature (C)"
)
dfDailyMSY %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(year, month, temperature_2m_max, temperature_2m_min, precipitation_sum, windspeed_10m_max) %>%
group_by(year, month) %>%
summarize(across(-c(precipitation_sum), .fns=mean),
across(c(precipitation_sum), .fns=sum),
.groups="drop"
) %>%
group_by(month) %>%
summarize(across(-c(year), .fns=mean)) %>%
pivot_longer(-c(month)) %>%
ggplot(aes(x=month, y=value)) +
geom_line(aes(group=1)) +
facet_wrap(~tmpMapNames[name], scales="free_y") +
labs(x=NULL, y=NULL, title="Monthly averages for key metrics (New Orleans 1960-2023)")
Averages by month for select categorical variables are plotted:
tmpMapNames <- c("wc"="1. Weather Type",
"winddir"="2. Predominant Wind Direction"
)
tmpDFPlot <- dfDailyMSY %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb),
year=year(date),
winddir=case_when(winddirection_10m_dominant>360~"Invalid",
winddirection_10m_dominant>=315~"1. N",
winddirection_10m_dominant>=225~"2. W",
winddirection_10m_dominant>=135~"3. S",
winddirection_10m_dominant>=45~"4. E",
winddirection_10m_dominant>=0~"1. N",
TRUE~"Invalid"
),
wc=case_when(weathercode==0~"1. Clear",
weathercode %in% c(1, 2, 3)~"2. Dry",
weathercode %in% c(51, 53, 55)~"3. Drizzle",
weathercode %in% c(61, 63, 65)~"4. Rain",
weathercode %in% c(71, 73, 75)~"5. Snow",
TRUE~"Error"
)
) %>%
select(month, wc, winddir) %>%
pivot_longer(-c(month)) %>%
count(month, name, value)
tmpPlotFN <- function(x) {
p1 <- tmpDFPlot %>%
filter(name==x) %>%
ggplot(aes(x=month, y=n)) +
geom_line(aes(group=value, color=value), lwd=2) +
labs(x=NULL,
y=NULL,
title=paste0(tmpMapNames[x], " (New Orleans 1960-2023)")
) +
scale_color_discrete(NULL) +
theme(legend.position = "bottom")
return(p1)
}
gridExtra::grid.arrange(tmpPlotFN("wc"), tmpPlotFN("winddir"), nrow=1)
Averages by year for select continuous variables are plotted:
tmpMapNames <- c("precipitation_sum"="3. Precipitation (mm)",
"windspeed_10m_max"="4. Windspeed (kph)",
"temperature_2m_max"="1. High Temperature (C)",
"temperature_2m_min"="2. Low Temperature (C)"
)
dfDailyMSY %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(year, temperature_2m_max, temperature_2m_min, precipitation_sum, windspeed_10m_max) %>%
group_by(year) %>%
summarize(across(-c(precipitation_sum), .fns=mean),
across(c(precipitation_sum), .fns=sum),
.groups="drop"
) %>%
pivot_longer(-c(year)) %>%
ggplot(aes(x=year, y=value)) +
geom_line(aes(group=1)) +
facet_wrap(~tmpMapNames[name], scales="free_y") +
labs(x=NULL, y=NULL, title="Annual averages for key metrics (New Orleans 1960-2023)")
Averages by year for select categorical variables are plotted:
tmpMapNames <- c("wc"="1. Weather Type",
"winddir"="2. Predominant Wind Direction"
)
tmpDFPlot <- dfDailyMSY %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb),
year=year(date),
winddir=case_when(winddirection_10m_dominant>360~"Invalid",
winddirection_10m_dominant>=315~"1. N",
winddirection_10m_dominant>=225~"2. W",
winddirection_10m_dominant>=135~"3. S",
winddirection_10m_dominant>=45~"4. E",
winddirection_10m_dominant>=0~"1. N",
TRUE~"Invalid"
),
wc=case_when(weathercode==0~"1. Clear",
weathercode %in% c(1, 2, 3)~"2. Dry",
weathercode %in% c(51, 53, 55)~"3. Drizzle",
weathercode %in% c(61, 63, 65)~"4. Rain",
weathercode %in% c(71, 73, 75)~"5. Snow",
TRUE~"Error"
)
) %>%
select(year, wc, winddir) %>%
pivot_longer(-c(year)) %>%
count(year, name, value)
tmpPlotFN <- function(x) {
p1 <- tmpDFPlot %>%
filter(name==x) %>%
ggplot(aes(x=year, y=n)) +
geom_line(aes(group=value, color=value), lwd=1) +
labs(x=NULL,
y=NULL,
title=paste0(tmpMapNames[x], " (New Orleans 1960-2023)")
) +
scale_color_discrete(NULL) +
theme(legend.position = "bottom")
return(p1)
}
gridExtra::grid.arrange(tmpPlotFN("wc"), tmpPlotFN("winddir"), nrow=1)
Boxplots for maximum windspeed are created:
dfDailyMSY %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(month, windspeed_10m_max) %>%
ggplot(aes(x=month, y=windspeed_10m_max)) +
geom_boxplot(fill="lightblue") +
lims(y=c(0, NA)) +
labs(x=NULL,
y="Maximum daily wind speed (kph)",
title="Maximum daily windspeed by month (New Orleans 1960-2023)"
)
Boxplots for precipitation are created:
dfDailyMSY %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(month, precipitation_hours) %>%
ggplot(aes(x=month, y=precipitation_hours)) +
geom_boxplot(fill="lightblue") +
lims(y=c(0, NA)) +
labs(x=NULL,
y="Daily precipitation hours",
title="Daily precipitation hours by month (New Orleans 1960-2023)"
)
dfDailyMSY %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(month, precipitation_sum) %>%
ggplot(aes(x=month, y=precipitation_sum)) +
geom_boxplot(fill="lightblue") +
lims(y=c(0, NA)) +
labs(x=NULL,
y="Daily precipitation (mm)",
title="Daily precipitation by month (New Orleans 1960-2023)"
)
Boxplots for temperature are created:
dfDailyMSY %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(month, temperature_2m_max) %>%
ggplot(aes(x=month, y=temperature_2m_max)) +
geom_boxplot(fill="lightblue") +
labs(x=NULL,
y="Daily high temperature (C)",
title="Daily high temperature by month (New Orleans 1960-2023)"
)
dfDailyMSY %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(month, temperature_2m_min) %>%
ggplot(aes(x=month, y=temperature_2m_min)) +
geom_boxplot(fill="lightblue") +
labs(x=NULL,
y="Daily low temperature (C)",
title="Daily low temperature by month (New Orleans 1960-2023)"
)
ACF and PACF are explored for maximum temperature:
acfTemp <- acf(dfDailyMSY$temperature_2m_max, lag.max=1000)
as.vector(acfTemp$acf) %>% findPeaks(width=21) %>% which()
## [1] 371 736
as.vector(acfTemp$acf) %>% findPeaks(width=21, FUN=min) %>% which()
## [1] 185 546 917
pacfTemp <- pacf(dfDailyMSY$temperature_2m_max, lag.max=1000)
pacf(dfDailyMSY$temperature_2m_max, lag.max=50)
As expected, ACF has a sustained seasonal pattern, with peaks (at roughly intervals of 365) and valleys (at roughly intervals of 365 offset by roughly 185) corresponding to the 365-day year
ACF and PACF are explored for precipitation:
acfPrecip <- acf(dfDailyMSY$precipitation_sum, lag.max=1000)
acf(dfDailyMSY$precipitation_sum, lag.max=50)
as.vector(acfPrecip$acf) %>% findPeaks(width=21) %>% which()
## [1] 18 70 99 113 127 139 155 179 203 227 241 264 327 362 382 403 433 452 465
## [20] 486 530 544 565 577 628 641 670 690 703 721 737 761 792 817 842 854 880 892
## [39] 909 937 967 989
as.vector(acfPrecip$acf) %>% findPeaks(width=21, FUN=min) %>% which()
## [1] 22 50 64 82 104 131 150 164 176 196 223 247 261 290 307 329 345 376 388
## [20] 412 428 439 456 474 492 526 552 568 580 592 625 636 659 680 700 725 741 757
## [39] 775 797 809 830 846 872 885 906 918 944 971
pacfPrecip <- pacf(dfDailyMSY$precipitation_sum, lag.max=1000)
pacf(dfDailyMSY$precipitation_sum, lag.max=50)
Precipitation by contrast has little seasonality and mostly just a slight correlation to the previous day’s value
ACF and PACF are explored for wind speed:
acfWind <- acf(dfDailyMSY$windspeed_10m_max, lag.max=1000)
acf(dfDailyMSY$windspeed_10m_max, lag.max=50)
as.vector(acfWind$acf) %>% findPeaks(width=21) %>% which()
## [1] 170 207 227 333 353 542 579 713 725 741 767 893 920 938
as.vector(acfWind$acf) %>% findPeaks(width=21, FUN=min) %>% which()
## [1] 164 191 365 532 549 583 719 734 869 911 922
pacfWind <- pacf(dfDailyMSY$windspeed_10m_max, lag.max=1000)
pacf(dfDailyMSY$windspeed_10m_max, lag.max=50)
Similar to temperature, wind speed appears to have a sustained seasonal component, though peak correlations are much lower in magnitude (~0.2 for wind speed vs. ~0.8 for temperature)
A boxplot for delta daily temperature is created:
dfDailyMSY %>%
mutate(month=factor(month.abb[month(date)], levels=month.abb),
chg=temperature_2m_max-lag(temperature_2m_max)
) %>%
filter(!is.na(chg)) %>%
ggplot(aes(x=month, y=chg)) +
geom_boxplot() +
labs(x=NULL, y="Daily change in maximum temperature")
Daily temperature changes are generally larger in magnitude during the colder season
A boxplot for delta daily wind speed is created:
dfDailyMSY %>%
mutate(month=factor(month.abb[month(date)], levels=month.abb),
chg=windspeed_10m_max-lag(windspeed_10m_max)
) %>%
filter(!is.na(chg)) %>%
ggplot(aes(x=month, y=chg)) +
geom_boxplot() +
labs(x=NULL, y="Daily change in maximum wind speed")
Daily wind speed changes are generally similar in magnitude by season, though with more outliers in Aug-Oct
A boxplot for delta daily precipitation is created:
dfDailyMSY %>%
mutate(month=factor(month.abb[month(date)], levels=month.abb),
chg=precipitation_sum-lag(precipitation_sum)
) %>%
filter(!is.na(chg)) %>%
ggplot(aes(x=month, y=chg)) +
geom_boxplot() +
labs(x=NULL, y="Daily change in precipitation")
Most days have no precipitation. The corresponding boxplot for precipitation change has small boxes and whiskers with many outliers
Averages for temperature, precipitation, and wind speed are calculated by day of year, using a 21-day rolling window:
df_r21 <- dfDailyMSY %>%
mutate(doy=pmin(yday(date), 365)) %>%
helperRollingAgg(origVar="temperature_2m_max", newName="temp_r21", k=21) %>%
helperRollingAgg(origVar="windspeed_10m_max", newName="wind_r21", k=21) %>%
helperRollingAgg(origVar="precipitation_sum", newName="precip_r21", k=21) %>%
select(date, doy, contains("_r21"))
df_r21 %>%
na.omit() %>%
group_by(doy) %>%
summarize(across(where(is.numeric), .fns=mean), n=n()) %>%
pivot_longer(cols=-c(doy)) %>%
ggplot(aes(x=doy, y=value)) +
geom_line(aes(group=name, color=name)) +
facet_wrap(~name, scales="free_y") +
labs(x="Day of Year", y="Rolling-21 mean") +
theme(legend.position="none")
Outliers are likely much more important in driving average precipitation than in driving average temperature
Standard deviations for temperature, precipitation, and wind speed are calculated by day of year, using a 21-day rolling window:
df_r21_sd <- dfDailyMSY %>%
mutate(doy=pmin(yday(date), 365)) %>%
group_by(doy) %>%
mutate(across(.cols=c(temperature_2m_max, windspeed_10m_max, precipitation_sum), .fns=sd)) %>%
ungroup() %>%
helperRollingAgg(origVar="temperature_2m_max", newName="temp_r21", k=21) %>%
helperRollingAgg(origVar="windspeed_10m_max", newName="wind_r21", k=21) %>%
helperRollingAgg(origVar="precipitation_sum", newName="precip_r21", k=21) %>%
select(date, doy, contains("_r21"))
df_r21_sd %>%
na.omit() %>%
group_by(doy) %>%
summarize(across(where(is.numeric), .fns=mean)) %>%
pivot_longer(cols=-c(doy)) %>%
ggplot(aes(x=doy, y=value)) +
geom_line(aes(group=name, color=name)) +
facet_wrap(~name, scales="free_y") +
labs(x="Day of Year", y="Rolling-21 mean of daily standard deviation") +
theme(legend.position="none")
Means and standard deviations are plotted together:
df_r21 %>%
bind_rows(df_r21_sd, .id="src") %>%
na.omit() %>%
mutate(musig=c("1"="Mean", "2"="SD")[src]) %>%
group_by(doy, musig) %>%
summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
pivot_longer(cols=-c(doy, musig)) %>%
pivot_wider(id_cols=c(doy, name), names_from="musig") %>%
ggplot(aes(x=doy)) +
geom_line(aes(y=Mean, group=name, color=name), lwd=2) +
geom_ribbon(aes(ymin=Mean-SD, ymax=Mean+SD, fill=name), alpha=0.5) +
facet_wrap(~name, scales="free_y") +
labs(x="Day of Year",
y="Rolling-21 mean +/- daily standard deviation",
title="Rolling 21-day mean +/- one rolling 21-day sd"
) +
theme(legend.position="none")
Means and approximate SEM are plotted together:
nYears <- length(unique(year(df_r21$date)))
nYears
## [1] 64
df_r21 %>%
bind_rows(df_r21_sd, .id="src") %>%
na.omit() %>%
mutate(musig=c("1"="Mean", "2"="SD")[src]) %>%
group_by(doy, musig) %>%
summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
pivot_longer(cols=-c(doy, musig)) %>%
pivot_wider(id_cols=c(doy, name), names_from="musig") %>%
ggplot(aes(x=doy)) +
geom_line(aes(y=Mean, group=name, color=name), lwd=2) +
geom_ribbon(aes(ymin=Mean-SD/sqrt(nYears-1), ymax=Mean+SD/sqrt(nYears-1), fill=name), alpha=0.5) +
facet_wrap(~name, scales="free_y") +
labs(x="Day of Year",
y="Rolling-21 mean +/- 1 SEM (approx)",
title="Rolling 21-day mean +/- 1 SEM (approx)"
) +
theme(legend.position="none")
Consistent with previous analyses, temperature has a strong seasonal pattern with differences in mean that vastly exceed the standard error of the mean. By contrast, while there is apparent seasonal spikiness to precipitation, rolling 21-day means are usually within one standard error of the mean of each other. Wind is more similar to temperature in having seasonal variations that meaningfully exceed SEM in magnitude.
Long term daily data is downloaded for Denver:
# Daily data download for Denver, CO
testURLDaily <- helperOpenMeteoURL(cityName="Denver CO",
dailyIndices=1:nrow(tblMetricsDaily),
startDate="1960-01-01",
endDate="2023-12-31",
tz="US/Mountain"
)
##
## Daily metrics created from indices: weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration
testURLDaily
## [1] "https://archive-api.open-meteo.com/v1/archive?latitude=39.77&longitude=-104.87&start_date=1960-01-01&end_date=2023-12-31&daily=weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration&timezone=US%2FMountain"
# Download file
if(!file.exists("testOM_daily_den.json")) {
fileDownload(fileName="testOM_daily_den.json", url=testURLDaily)
} else {
cat("\nFile testOM_daily_den.json already exists, skipping download\n")
}
##
## File testOM_daily_den.json already exists, skipping download
The daily dataset is loaded:
# Read daily JSON file
denOMDaily <- formatOpenMeteoJSON("testOM_daily_den.json")
##
## Objects in JSON include: latitude, longitude, generationtime_ms, utc_offset_seconds, timezone, timezone_abbreviation, elevation, daily_units, daily
##
## $tblDaily
## # A tibble: 23,376 × 18
## date time weathercode temperature_2m_max temperature_2m_min
## <date> <chr> <int> <dbl> <dbl>
## 1 1960-01-01 1960-01-01 75 -4.5 -11
## 2 1960-01-02 1960-01-02 3 -5.7 -19.2
## 3 1960-01-03 1960-01-03 1 -7.4 -22.5
## 4 1960-01-04 1960-01-04 3 -1.9 -22.8
## 5 1960-01-05 1960-01-05 1 -1.5 -20.8
## 6 1960-01-06 1960-01-06 3 3.4 -16
## 7 1960-01-07 1960-01-07 1 5.4 -12.7
## 8 1960-01-08 1960-01-08 3 9.6 -9.1
## 9 1960-01-09 1960-01-09 3 11.7 -5.6
## 10 1960-01-10 1960-01-10 3 9.2 -4.1
## # ℹ 23,366 more rows
## # ℹ 13 more variables: apparent_temperature_max <dbl>,
## # apparent_temperature_min <dbl>, precipitation_sum <dbl>, rain_sum <dbl>,
## # snowfall_sum <dbl>, precipitation_hours <dbl>, sunrise <chr>, sunset <chr>,
## # windspeed_10m_max <dbl>, windgusts_10m_max <dbl>,
## # winddirection_10m_dominant <int>, shortwave_radiation_sum <dbl>,
## # et0_fao_evapotranspiration <dbl>
##
## $tblHourly
## NULL
##
## $tblUnits
## # A tibble: 17 × 4
## metricType name value description
## <chr> <chr> <chr> <chr>
## 1 daily_units time "iso8601" <NA>
## 2 daily_units weathercode "wmo code" The most severe weather co…
## 3 daily_units temperature_2m_max "deg C" Maximum and minimum daily …
## 4 daily_units temperature_2m_min "deg C" Maximum and minimum daily …
## 5 daily_units apparent_temperature_max "deg C" Maximum and minimum daily …
## 6 daily_units apparent_temperature_min "deg C" Maximum and minimum daily …
## 7 daily_units precipitation_sum "mm" Sum of daily precipitation…
## 8 daily_units rain_sum "mm" Sum of daily rain
## 9 daily_units snowfall_sum "cm" Sum of daily snowfall
## 10 daily_units precipitation_hours "h" The number of hours with r…
## 11 daily_units sunrise "iso8601" Sun rise and set times
## 12 daily_units sunset "iso8601" Sun rise and set times
## 13 daily_units windspeed_10m_max "km/h" Maximum wind speed and gus…
## 14 daily_units windgusts_10m_max "km/h" Maximum wind speed and gus…
## 15 daily_units winddirection_10m_dominant "deg " Dominant wind direction
## 16 daily_units shortwave_radiation_sum "MJ/m²" The sum of solar radiaion …
## 17 daily_units et0_fao_evapotranspiration "mm" Daily sum of ET0 Reference…
##
## $tblDescription
## # A tibble: 1 × 7
## latitude longitude generationtime_ms utc_offset_seconds timezone
## <dbl> <dbl> <dbl> <int> <chr>
## 1 39.8 -105. 8310. -21600 US/Mountain
## # ℹ 2 more variables: timezone_abbreviation <chr>, elevation <dbl>
##
##
## latitude: 39.75395
## longitude: -104.8957
## generationtime_ms: 8310.172
## utc_offset_seconds: -21600
## timezone: US/Mountain
## timezone_abbreviation: GMT-6
## elevation: 1608
# Sample records of tibble
denOMDaily$tblDaily %>% glimpse()
## Rows: 23,376
## Columns: 18
## $ date <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode <int> 75, 3, 1, 3, 1, 3, 1, 3, 3, 3, 3, 51, 73, 7…
## $ temperature_2m_max <dbl> -4.5, -5.7, -7.4, -1.9, -1.5, 3.4, 5.4, 9.6…
## $ temperature_2m_min <dbl> -11.0, -19.2, -22.5, -22.8, -20.8, -16.0, -…
## $ apparent_temperature_max <dbl> -8.9, -10.0, -11.9, -6.2, -5.5, -0.3, 0.5, …
## $ apparent_temperature_min <dbl> -15.8, -24.5, -28.3, -28.8, -26.4, -21.4, -…
## $ precipitation_sum <dbl> 8.4, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0…
## $ rain_sum <dbl> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0…
## $ snowfall_sum <dbl> 5.88, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
## $ precipitation_hours <dbl> 13, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 3, 17,…
## $ sunrise <chr> "1960-01-01T08:20", "1960-01-02T08:20", "19…
## $ sunset <chr> "1960-01-01T17:45", "1960-01-02T17:46", "19…
## $ windspeed_10m_max <dbl> 22.1, 9.8, 13.3, 14.5, 14.3, 13.4, 16.1, 17…
## $ windgusts_10m_max <dbl> 36.0, 15.8, 21.2, 23.4, 21.2, 22.3, 25.9, 2…
## $ winddirection_10m_dominant <int> 15, 156, 154, 149, 175, 196, 166, 182, 188,…
## $ shortwave_radiation_sum <dbl> 6.21, 10.29, 10.29, 11.02, 10.95, 10.69, 10…
## $ et0_fao_evapotranspiration <dbl> 0.51, 0.73, 0.71, 0.89, 0.90, 1.17, 1.26, 1…
Variables are converted to proper data type:
dfDailyDEN <- denOMDaily$tblDaily %>%
mutate(weathercode=factor(weathercode),
sunrise_chr=sunrise,
sunset_chr=sunset,
sunrise=lubridate::ymd_hm(sunrise),
sunset=lubridate::ymd_hm(sunset),
fct_winddir=factor(winddirection_10m_dominant)
)
glimpse(dfDailyDEN)
## Rows: 23,376
## Columns: 21
## $ date <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode <fct> 75, 3, 1, 3, 1, 3, 1, 3, 3, 3, 3, 51, 73, 7…
## $ temperature_2m_max <dbl> -4.5, -5.7, -7.4, -1.9, -1.5, 3.4, 5.4, 9.6…
## $ temperature_2m_min <dbl> -11.0, -19.2, -22.5, -22.8, -20.8, -16.0, -…
## $ apparent_temperature_max <dbl> -8.9, -10.0, -11.9, -6.2, -5.5, -0.3, 0.5, …
## $ apparent_temperature_min <dbl> -15.8, -24.5, -28.3, -28.8, -26.4, -21.4, -…
## $ precipitation_sum <dbl> 8.4, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0…
## $ rain_sum <dbl> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0…
## $ snowfall_sum <dbl> 5.88, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
## $ precipitation_hours <dbl> 13, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 3, 17,…
## $ sunrise <dttm> 1960-01-01 08:20:00, 1960-01-02 08:20:00, …
## $ sunset <dttm> 1960-01-01 17:45:00, 1960-01-02 17:46:00, …
## $ windspeed_10m_max <dbl> 22.1, 9.8, 13.3, 14.5, 14.3, 13.4, 16.1, 17…
## $ windgusts_10m_max <dbl> 36.0, 15.8, 21.2, 23.4, 21.2, 22.3, 25.9, 2…
## $ winddirection_10m_dominant <int> 15, 156, 154, 149, 175, 196, 166, 182, 188,…
## $ shortwave_radiation_sum <dbl> 6.21, 10.29, 10.29, 11.02, 10.95, 10.69, 10…
## $ et0_fao_evapotranspiration <dbl> 0.51, 0.73, 0.71, 0.89, 0.90, 1.17, 1.26, 1…
## $ sunrise_chr <chr> "1960-01-01T08:20", "1960-01-02T08:20", "19…
## $ sunset_chr <chr> "1960-01-01T17:45", "1960-01-02T17:46", "19…
## $ fct_winddir <fct> 15, 156, 154, 149, 175, 196, 166, 182, 188,…
Averages by month for select continuous variables are plotted:
tmpMapNames <- c("precipitation_sum"="3. Precipitation (mm)",
"windspeed_10m_max"="4. Windspeed (kph)",
"temperature_2m_max"="1. High Temperature (C)",
"temperature_2m_min"="2. Low Temperature (C)"
)
dfDailyDEN %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(year, month, temperature_2m_max, temperature_2m_min, precipitation_sum, windspeed_10m_max) %>%
group_by(year, month) %>%
summarize(across(-c(precipitation_sum), .fns=mean),
across(c(precipitation_sum), .fns=sum),
.groups="drop"
) %>%
group_by(month) %>%
summarize(across(-c(year), .fns=mean)) %>%
pivot_longer(-c(month)) %>%
ggplot(aes(x=month, y=value)) +
geom_line(aes(group=1)) +
facet_wrap(~tmpMapNames[name], scales="free_y") +
labs(x=NULL, y=NULL, title="Monthly averages for key metrics (Denver 1960-2023)")
Averages by month for select categorical variables are plotted:
tmpMapNames <- c("wc"="1. Weather Type",
"winddir"="2. Predominant Wind Direction"
)
tmpDFPlot <- dfDailyDEN %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb),
year=year(date),
winddir=case_when(winddirection_10m_dominant>360~"Invalid",
winddirection_10m_dominant>=315~"1. N",
winddirection_10m_dominant>=225~"2. W",
winddirection_10m_dominant>=135~"3. S",
winddirection_10m_dominant>=45~"4. E",
winddirection_10m_dominant>=0~"1. N",
TRUE~"Invalid"
),
wc=case_when(weathercode==0~"1. Clear",
weathercode %in% c(1, 2, 3)~"2. Dry",
weathercode %in% c(51, 53, 55)~"3. Drizzle",
weathercode %in% c(61, 63, 65)~"4. Rain",
weathercode %in% c(71, 73, 75)~"5. Snow",
TRUE~"Error"
)
) %>%
select(month, wc, winddir) %>%
pivot_longer(-c(month)) %>%
count(month, name, value)
tmpPlotFN <- function(x) {
p1 <- tmpDFPlot %>%
filter(name==x) %>%
ggplot(aes(x=month, y=n)) +
geom_line(aes(group=value, color=value), lwd=2) +
labs(x=NULL,
y=NULL,
title=paste0(tmpMapNames[x], " (Denver 1960-2023)")
) +
scale_color_discrete(NULL) +
theme(legend.position = "bottom")
return(p1)
}
gridExtra::grid.arrange(tmpPlotFN("wc"), tmpPlotFN("winddir"), nrow=1)
Averages by year for select continuous variables are plotted:
tmpMapNames <- c("precipitation_sum"="3. Precipitation (mm)",
"windspeed_10m_max"="4. Windspeed (kph)",
"temperature_2m_max"="1. High Temperature (C)",
"temperature_2m_min"="2. Low Temperature (C)"
)
dfDailyDEN %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(year, temperature_2m_max, temperature_2m_min, precipitation_sum, windspeed_10m_max) %>%
group_by(year) %>%
summarize(across(-c(precipitation_sum), .fns=mean),
across(c(precipitation_sum), .fns=sum),
.groups="drop"
) %>%
pivot_longer(-c(year)) %>%
ggplot(aes(x=year, y=value)) +
geom_line(aes(group=1)) +
facet_wrap(~tmpMapNames[name], scales="free_y") +
labs(x=NULL, y=NULL, title="Annual averages for key metrics (Denver 1960-2023)")
Averages by year for select categorical variables are plotted:
tmpMapNames <- c("wc"="1. Weather Type",
"winddir"="2. Predominant Wind Direction"
)
tmpDFPlot <- dfDailyDEN %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb),
year=year(date),
winddir=case_when(winddirection_10m_dominant>360~"Invalid",
winddirection_10m_dominant>=315~"1. N",
winddirection_10m_dominant>=225~"2. W",
winddirection_10m_dominant>=135~"3. S",
winddirection_10m_dominant>=45~"4. E",
winddirection_10m_dominant>=0~"1. N",
TRUE~"Invalid"
),
wc=case_when(weathercode==0~"1. Clear",
weathercode %in% c(1, 2, 3)~"2. Dry",
weathercode %in% c(51, 53, 55)~"3. Drizzle",
weathercode %in% c(61, 63, 65)~"4. Rain",
weathercode %in% c(71, 73, 75)~"5. Snow",
TRUE~"Error"
)
) %>%
select(year, wc, winddir) %>%
pivot_longer(-c(year)) %>%
count(year, name, value)
tmpPlotFN <- function(x) {
p1 <- tmpDFPlot %>%
filter(name==x) %>%
ggplot(aes(x=year, y=n)) +
geom_line(aes(group=value, color=value), lwd=1) +
labs(x=NULL,
y=NULL,
title=paste0(tmpMapNames[x], " (Denver 1960-2023)")
) +
scale_color_discrete(NULL) +
theme(legend.position = "bottom")
return(p1)
}
gridExtra::grid.arrange(tmpPlotFN("wc"), tmpPlotFN("winddir"), nrow=1)
Boxplots for maximum windspeed are created:
dfDailyDEN %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(month, windspeed_10m_max) %>%
ggplot(aes(x=month, y=windspeed_10m_max)) +
geom_boxplot(fill="lightblue") +
lims(y=c(0, NA)) +
labs(x=NULL,
y="Maximum daily wind speed (kph)",
title="Maximum daily windspeed by month (Denver 1960-2023)"
)
Boxplots for precipitation are created:
dfDailyDEN %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(month, precipitation_hours) %>%
ggplot(aes(x=month, y=precipitation_hours)) +
geom_boxplot(fill="lightblue") +
lims(y=c(0, NA)) +
labs(x=NULL,
y="Daily precipitation hours",
title="Daily precipitation hours by month (Denver 1960-2023)"
)
dfDailyDEN %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(month, precipitation_sum) %>%
ggplot(aes(x=month, y=precipitation_sum)) +
geom_boxplot(fill="lightblue") +
lims(y=c(0, NA)) +
labs(x=NULL,
y="Daily precipitation (mm)",
title="Daily precipitation by month (Denver 1960-2023)"
)
Boxplots for snow are also created:
dfDailyDEN %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(month, snowfall_sum) %>%
ggplot(aes(x=month, y=snowfall_sum)) +
geom_boxplot(fill="lightblue") +
lims(y=c(0, NA)) +
labs(x=NULL,
y="Daily snowfall (liquid equivalent mm)",
title="Daily snowfall by month (Denver 1960-2023)"
)
Boxplots for temperature are created:
dfDailyDEN %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(month, temperature_2m_max) %>%
ggplot(aes(x=month, y=temperature_2m_max)) +
geom_boxplot(fill="lightblue") +
labs(x=NULL,
y="Daily high temperature (C)",
title="Daily high temperature by month (Denver 1960-2023)"
)
dfDailyDEN %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(month, temperature_2m_min) %>%
ggplot(aes(x=month, y=temperature_2m_min)) +
geom_boxplot(fill="lightblue") +
labs(x=NULL,
y="Daily low temperature (C)",
title="Daily low temperature by month (Denver 1960-2023)"
)
ACF and PACF are explored for maximum temperature:
acfTemp <- acf(dfDailyDEN$temperature_2m_max, lag.max=1000)
as.vector(acfTemp$acf) %>% findPeaks(width=21) %>% which()
## [1] 369 729
as.vector(acfTemp$acf) %>% findPeaks(width=21, FUN=min) %>% which()
## [1] 181 545 915
pacfTemp <- pacf(dfDailyDEN$temperature_2m_max, lag.max=1000)
pacf(dfDailyDEN$temperature_2m_max, lag.max=50)
As expected, ACF has a sustained seasonal pattern, with peaks (at roughly intervals of 365) and valleys (at roughly intervals of 365 offset by roughly 185) corresponding to the 365-day year
ACF and PACF are explored for precipitation:
acfPrecip <- acf(dfDailyDEN$precipitation_sum, lag.max=1000)
acf(dfDailyDEN$precipitation_sum, lag.max=50)
as.vector(acfPrecip$acf) %>% findPeaks(width=21) %>% which()
## [1] 57 77 97 127 149 176 197 212 223 242 269 298 316 333 360 371 399 422 455
## [20] 471 498 535 562 580 591 606 629 652 673 687 708 735 754 785 800 819 830 859
## [39] 871 884 905 918 939 953 981
as.vector(acfPrecip$acf) %>% findPeaks(width=21, FUN=min) %>% which()
## [1] 16 43 61 91 121 143 169 183 199 217 235 251 265 279 304 337 391 406 427
## [20] 443 463 514 544 575 595 613 627 666 703 730 751 767 798 816 840 866 897 920
## [39] 942 965 979 990
pacfPrecip <- pacf(dfDailyDEN$precipitation_sum, lag.max=1000)
pacf(dfDailyDEN$precipitation_sum, lag.max=50)
Precipitation by contrast has little seasonality and mostly just a slight correlation to the previous day’s value
ACF and PACF are explored for wind speed:
acfWind <- acf(dfDailyDEN$windspeed_10m_max, lag.max=1000)
acf(dfDailyDEN$windspeed_10m_max, lag.max=50)
as.vector(acfWind$acf) %>% findPeaks(width=21) %>% which()
## [1] 46 68 89 111 148 170 197 215 237 282 302 342 373 407 454 476 488 509 535
## [20] 557 584 596 614 708 723 738 811 835 858 880 903 915 927 961
as.vector(acfWind$acf) %>% findPeaks(width=21, FUN=min) %>% which()
## [1] 76 96 108 141 184 202 229 242 255 272 293 305 326 377 402 424 443 456 473
## [20] 503 515 544 582 602 627 688 713 730 762 776 803 832 870 885 909 921 951 966
pacfWind <- pacf(dfDailyDEN$windspeed_10m_max, lag.max=1000)
pacf(dfDailyDEN$windspeed_10m_max, lag.max=50)
Similar to temperature, wind speed appears to have a sustained seasonal component, though peak correlations are very low in magnitude (~0.05 for wind speed vs. ~0.75 for temperature)
A boxplot for delta daily temperature is created:
dfDailyDEN %>%
mutate(month=factor(month.abb[month(date)], levels=month.abb),
chg=temperature_2m_max-lag(temperature_2m_max)
) %>%
filter(!is.na(chg)) %>%
ggplot(aes(x=month, y=chg)) +
geom_boxplot() +
labs(x=NULL, y="Daily change in maximum temperature")
Daily temperature changes are generally larger in magnitude during the colder season
A boxplot for delta daily wind speed is created:
dfDailyDEN %>%
mutate(month=factor(month.abb[month(date)], levels=month.abb),
chg=windspeed_10m_max-lag(windspeed_10m_max)
) %>%
filter(!is.na(chg)) %>%
ggplot(aes(x=month, y=chg)) +
geom_boxplot() +
labs(x=NULL, y="Daily change in maximum wind speed")
Daily wind speed changes are generally similar in magnitude by season
A boxplot for delta daily precipitation is created:
dfDailyDEN %>%
mutate(month=factor(month.abb[month(date)], levels=month.abb),
chg=precipitation_sum-lag(precipitation_sum)
) %>%
filter(!is.na(chg)) %>%
ggplot(aes(x=month, y=chg)) +
geom_boxplot() +
labs(x=NULL, y="Daily change in precipitation")
Most days have no precipitation. The corresponding boxplot for precipitation change has small boxes and whiskers with many outliers
Averages for temperature, precipitation, and wind speed are calculated by day of year, using a 21-day rolling window:
df_r21 <- dfDailyDEN %>%
mutate(doy=pmin(yday(date), 365)) %>%
helperRollingAgg(origVar="temperature_2m_max", newName="temp_r21", k=21) %>%
helperRollingAgg(origVar="windspeed_10m_max", newName="wind_r21", k=21) %>%
helperRollingAgg(origVar="precipitation_sum", newName="precip_r21", k=21) %>%
select(date, doy, contains("_r21"))
df_r21 %>%
na.omit() %>%
group_by(doy) %>%
summarize(across(where(is.numeric), .fns=mean), n=n()) %>%
pivot_longer(cols=-c(doy)) %>%
ggplot(aes(x=doy, y=value)) +
geom_line(aes(group=name, color=name)) +
facet_wrap(~name, scales="free_y") +
labs(x="Day of Year", y="Rolling-21 mean") +
theme(legend.position="none")
Outliers are likely much more important in driving average precipitation than in driving average temperature
Standard deviations for temperature, precipitation, and wind speed are calculated by day of year, using a 21-day rolling window:
df_r21_sd <- dfDailyDEN %>%
mutate(doy=pmin(yday(date), 365)) %>%
group_by(doy) %>%
mutate(across(.cols=c(temperature_2m_max, windspeed_10m_max, precipitation_sum), .fns=sd)) %>%
ungroup() %>%
helperRollingAgg(origVar="temperature_2m_max", newName="temp_r21", k=21) %>%
helperRollingAgg(origVar="windspeed_10m_max", newName="wind_r21", k=21) %>%
helperRollingAgg(origVar="precipitation_sum", newName="precip_r21", k=21) %>%
select(date, doy, contains("_r21"))
df_r21_sd %>%
na.omit() %>%
group_by(doy) %>%
summarize(across(where(is.numeric), .fns=mean)) %>%
pivot_longer(cols=-c(doy)) %>%
ggplot(aes(x=doy, y=value)) +
geom_line(aes(group=name, color=name)) +
facet_wrap(~name, scales="free_y") +
labs(x="Day of Year", y="Rolling-21 mean of daily standard deviation") +
theme(legend.position="none")
Means and standard deviations are plotted together:
df_r21 %>%
bind_rows(df_r21_sd, .id="src") %>%
na.omit() %>%
mutate(musig=c("1"="Mean", "2"="SD")[src]) %>%
group_by(doy, musig) %>%
summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
pivot_longer(cols=-c(doy, musig)) %>%
pivot_wider(id_cols=c(doy, name), names_from="musig") %>%
ggplot(aes(x=doy)) +
geom_line(aes(y=Mean, group=name, color=name), lwd=2) +
geom_ribbon(aes(ymin=Mean-SD, ymax=Mean+SD, fill=name), alpha=0.5) +
facet_wrap(~name, scales="free_y") +
labs(x="Day of Year",
y="Rolling-21 mean +/- daily standard deviation",
title="Rolling 21-day mean +/- one rolling 21-day sd"
) +
theme(legend.position="none")
Means and approximate SEM are plotted together:
nYears <- length(unique(year(df_r21$date)))
nYears
## [1] 64
df_r21 %>%
bind_rows(df_r21_sd, .id="src") %>%
na.omit() %>%
mutate(musig=c("1"="Mean", "2"="SD")[src]) %>%
group_by(doy, musig) %>%
summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
pivot_longer(cols=-c(doy, musig)) %>%
pivot_wider(id_cols=c(doy, name), names_from="musig") %>%
ggplot(aes(x=doy)) +
geom_line(aes(y=Mean, group=name, color=name), lwd=2) +
geom_ribbon(aes(ymin=Mean-SD/sqrt(nYears-1), ymax=Mean+SD/sqrt(nYears-1), fill=name), alpha=0.5) +
facet_wrap(~name, scales="free_y") +
labs(x="Day of Year",
y="Rolling-21 mean +/- 1 SEM (approx)",
title="Rolling 21-day mean +/- 1 SEM (approx)"
) +
theme(legend.position="none")
Consistent with previous analyses, temperature has a strong seasonal pattern with differences in mean that vastly exceed the standard error of the mean. Wind is more similar to temperature in having seasonal variations that meaningfully exceed SEM in magnitude. Of interest, the seasonal spikiness in precipitationresults in many times of the year that are outside of one standard error of the mean of each other. This is in contrast to cities explored previously where SEM is typically greater than change in means by time of year
Long term daily data is downloaded for Phoenix, AZ:
# Daily data download for Phoenix, AZ
testURLDaily <- helperOpenMeteoURL(cityName="Phoenix AZ",
dailyIndices=1:nrow(tblMetricsDaily),
startDate="1960-01-01",
endDate="2023-12-31",
tz="US/Arizona"
)
##
## Daily metrics created from indices: weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration
testURLDaily
## [1] "https://archive-api.open-meteo.com/v1/archive?latitude=33.54&longitude=-112.07&start_date=1960-01-01&end_date=2023-12-31&daily=weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration&timezone=US%2FArizona"
# Download file
if(!file.exists("testOM_daily_phx.json")) {
fileDownload(fileName="testOM_daily_phx.json", url=testURLDaily)
} else {
cat("\nFile testOM_daily_phx.json already exists, skipping download\n")
}
##
## File testOM_daily_phx.json already exists, skipping download
The daily dataset is loaded:
# Read daily JSON file
phxOMDaily <- formatOpenMeteoJSON("testOM_daily_phx.json")
##
## Objects in JSON include: latitude, longitude, generationtime_ms, utc_offset_seconds, timezone, timezone_abbreviation, elevation, daily_units, daily
##
## $tblDaily
## # A tibble: 23,376 × 18
## date time weathercode temperature_2m_max temperature_2m_min
## <date> <chr> <int> <dbl> <dbl>
## 1 1960-01-01 1960-01-01 51 7.9 0.3
## 2 1960-01-02 1960-01-02 1 7.4 -3.6
## 3 1960-01-03 1960-01-03 0 9.4 -4
## 4 1960-01-04 1960-01-04 1 10 -3
## 5 1960-01-05 1960-01-05 0 9.4 -3.5
## 6 1960-01-06 1960-01-06 0 9.5 -3.4
## 7 1960-01-07 1960-01-07 0 13 -1.9
## 8 1960-01-08 1960-01-08 0 17 -0.4
## 9 1960-01-09 1960-01-09 3 17.4 1.7
## 10 1960-01-10 1960-01-10 51 13.8 5.9
## # ℹ 23,366 more rows
## # ℹ 13 more variables: apparent_temperature_max <dbl>,
## # apparent_temperature_min <dbl>, precipitation_sum <dbl>, rain_sum <dbl>,
## # snowfall_sum <dbl>, precipitation_hours <dbl>, sunrise <chr>, sunset <chr>,
## # windspeed_10m_max <dbl>, windgusts_10m_max <dbl>,
## # winddirection_10m_dominant <int>, shortwave_radiation_sum <dbl>,
## # et0_fao_evapotranspiration <dbl>
##
## $tblHourly
## NULL
##
## $tblUnits
## # A tibble: 17 × 4
## metricType name value description
## <chr> <chr> <chr> <chr>
## 1 daily_units time "iso8601" <NA>
## 2 daily_units weathercode "wmo code" The most severe weather co…
## 3 daily_units temperature_2m_max "deg C" Maximum and minimum daily …
## 4 daily_units temperature_2m_min "deg C" Maximum and minimum daily …
## 5 daily_units apparent_temperature_max "deg C" Maximum and minimum daily …
## 6 daily_units apparent_temperature_min "deg C" Maximum and minimum daily …
## 7 daily_units precipitation_sum "mm" Sum of daily precipitation…
## 8 daily_units rain_sum "mm" Sum of daily rain
## 9 daily_units snowfall_sum "cm" Sum of daily snowfall
## 10 daily_units precipitation_hours "h" The number of hours with r…
## 11 daily_units sunrise "iso8601" Sun rise and set times
## 12 daily_units sunset "iso8601" Sun rise and set times
## 13 daily_units windspeed_10m_max "km/h" Maximum wind speed and gus…
## 14 daily_units windgusts_10m_max "km/h" Maximum wind speed and gus…
## 15 daily_units winddirection_10m_dominant "deg " Dominant wind direction
## 16 daily_units shortwave_radiation_sum "MJ/m²" The sum of solar radiaion …
## 17 daily_units et0_fao_evapotranspiration "mm" Daily sum of ET0 Reference…
##
## $tblDescription
## # A tibble: 1 × 7
## latitude longitude generationtime_ms utc_offset_seconds timezone
## <dbl> <dbl> <dbl> <int> <chr>
## 1 33.6 -112. 6061. -25200 US/Arizona
## # ℹ 2 more variables: timezone_abbreviation <chr>, elevation <dbl>
##
##
## latitude: 33.56766
## longitude: -112.0818
## generationtime_ms: 6061.302
## utc_offset_seconds: -25200
## timezone: US/Arizona
## timezone_abbreviation: GMT-7
## elevation: 366
# Sample records of tibble
phxOMDaily$tblDaily %>% glimpse()
## Rows: 23,376
## Columns: 18
## $ date <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode <int> 51, 1, 0, 1, 0, 0, 0, 0, 3, 51, 51, 61, 61,…
## $ temperature_2m_max <dbl> 7.9, 7.4, 9.4, 10.0, 9.4, 9.5, 13.0, 17.0, …
## $ temperature_2m_min <dbl> 0.3, -3.6, -4.0, -3.0, -3.5, -3.4, -1.9, -0…
## $ apparent_temperature_max <dbl> 4.9, 4.1, 6.4, 6.4, 6.3, 6.5, 10.8, 15.2, 1…
## $ apparent_temperature_min <dbl> -3.1, -7.8, -8.2, -7.8, -8.0, -7.7, -5.6, -…
## $ precipitation_sum <dbl> 1.8, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0…
## $ rain_sum <dbl> 1.8, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0…
## $ snowfall_sum <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
## $ precipitation_hours <dbl> 10, 0, 0, 0, 0, 0, 0, 0, 0, 1, 14, 17, 18, …
## $ sunrise <chr> "1960-01-01T07:32", "1960-01-02T07:32", "19…
## $ sunset <chr> "1960-01-01T17:30", "1960-01-02T17:31", "19…
## $ windspeed_10m_max <dbl> 10.0, 9.9, 10.7, 12.3, 12.4, 10.9, 8.0, 8.4…
## $ windgusts_10m_max <dbl> 29.9, 17.3, 16.9, 24.8, 19.8, 17.3, 14.4, 1…
## $ winddirection_10m_dominant <int> 206, 43, 19, 17, 23, 49, 41, 68, 75, 117, 1…
## $ shortwave_radiation_sum <dbl> 11.08, 11.07, 13.10, 12.98, 13.19, 13.13, 1…
## $ et0_fao_evapotranspiration <dbl> 1.30, 1.26, 1.59, 1.69, 1.65, 1.56, 1.63, 1…
Variables are converted to proper data type:
dfDailyPHX <- phxOMDaily$tblDaily %>%
mutate(weathercode=factor(weathercode),
sunrise_chr=sunrise,
sunset_chr=sunset,
sunrise=lubridate::ymd_hm(sunrise),
sunset=lubridate::ymd_hm(sunset),
fct_winddir=factor(winddirection_10m_dominant)
)
glimpse(dfDailyPHX)
## Rows: 23,376
## Columns: 21
## $ date <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode <fct> 51, 1, 0, 1, 0, 0, 0, 0, 3, 51, 51, 61, 61,…
## $ temperature_2m_max <dbl> 7.9, 7.4, 9.4, 10.0, 9.4, 9.5, 13.0, 17.0, …
## $ temperature_2m_min <dbl> 0.3, -3.6, -4.0, -3.0, -3.5, -3.4, -1.9, -0…
## $ apparent_temperature_max <dbl> 4.9, 4.1, 6.4, 6.4, 6.3, 6.5, 10.8, 15.2, 1…
## $ apparent_temperature_min <dbl> -3.1, -7.8, -8.2, -7.8, -8.0, -7.7, -5.6, -…
## $ precipitation_sum <dbl> 1.8, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0…
## $ rain_sum <dbl> 1.8, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0…
## $ snowfall_sum <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
## $ precipitation_hours <dbl> 10, 0, 0, 0, 0, 0, 0, 0, 0, 1, 14, 17, 18, …
## $ sunrise <dttm> 1960-01-01 07:32:00, 1960-01-02 07:32:00, …
## $ sunset <dttm> 1960-01-01 17:30:00, 1960-01-02 17:31:00, …
## $ windspeed_10m_max <dbl> 10.0, 9.9, 10.7, 12.3, 12.4, 10.9, 8.0, 8.4…
## $ windgusts_10m_max <dbl> 29.9, 17.3, 16.9, 24.8, 19.8, 17.3, 14.4, 1…
## $ winddirection_10m_dominant <int> 206, 43, 19, 17, 23, 49, 41, 68, 75, 117, 1…
## $ shortwave_radiation_sum <dbl> 11.08, 11.07, 13.10, 12.98, 13.19, 13.13, 1…
## $ et0_fao_evapotranspiration <dbl> 1.30, 1.26, 1.59, 1.69, 1.65, 1.56, 1.63, 1…
## $ sunrise_chr <chr> "1960-01-01T07:32", "1960-01-02T07:32", "19…
## $ sunset_chr <chr> "1960-01-01T17:30", "1960-01-02T17:31", "19…
## $ fct_winddir <fct> 206, 43, 19, 17, 23, 49, 41, 68, 75, 117, 1…
Averages by month for select continuous variables are plotted:
tmpMapNames <- c("precipitation_sum"="3. Precipitation (mm)",
"windspeed_10m_max"="4. Windspeed (kph)",
"temperature_2m_max"="1. High Temperature (C)",
"temperature_2m_min"="2. Low Temperature (C)"
)
dfDailyPHX %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(year, month, temperature_2m_max, temperature_2m_min, precipitation_sum, windspeed_10m_max) %>%
group_by(year, month) %>%
summarize(across(-c(precipitation_sum), .fns=mean),
across(c(precipitation_sum), .fns=sum),
.groups="drop"
) %>%
group_by(month) %>%
summarize(across(-c(year), .fns=mean)) %>%
pivot_longer(-c(month)) %>%
ggplot(aes(x=month, y=value)) +
geom_line(aes(group=1)) +
facet_wrap(~tmpMapNames[name], scales="free_y") +
labs(x=NULL, y=NULL, title="Monthly averages for key metrics (Phoenix 1960-2023)")
Averages by month for select categorical variables are plotted:
tmpMapNames <- c("wc"="1. Weather Type",
"winddir"="2. Predominant Wind Direction"
)
tmpDFPlot <- dfDailyPHX %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb),
year=year(date),
winddir=case_when(winddirection_10m_dominant>360~"Invalid",
winddirection_10m_dominant>=315~"1. N",
winddirection_10m_dominant>=225~"2. W",
winddirection_10m_dominant>=135~"3. S",
winddirection_10m_dominant>=45~"4. E",
winddirection_10m_dominant>=0~"1. N",
TRUE~"Invalid"
),
wc=case_when(weathercode==0~"1. Clear",
weathercode %in% c(1, 2, 3)~"2. Dry",
weathercode %in% c(51, 53, 55)~"3. Drizzle",
weathercode %in% c(61, 63, 65)~"4. Rain",
weathercode %in% c(71, 73, 75)~"5. Snow",
TRUE~"Error"
)
) %>%
select(month, wc, winddir) %>%
pivot_longer(-c(month)) %>%
count(month, name, value)
tmpPlotFN <- function(x) {
p1 <- tmpDFPlot %>%
filter(name==x) %>%
ggplot(aes(x=month, y=n)) +
geom_line(aes(group=value, color=value), lwd=2) +
labs(x=NULL,
y=NULL,
title=paste0(tmpMapNames[x], " (Phoenix 1960-2023)")
) +
scale_color_discrete(NULL) +
theme(legend.position = "bottom")
return(p1)
}
gridExtra::grid.arrange(tmpPlotFN("wc"), tmpPlotFN("winddir"), nrow=1)
Averages by year for select continuous variables are plotted:
tmpMapNames <- c("precipitation_sum"="3. Precipitation (mm)",
"windspeed_10m_max"="4. Windspeed (kph)",
"temperature_2m_max"="1. High Temperature (C)",
"temperature_2m_min"="2. Low Temperature (C)"
)
dfDailyPHX %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(year, temperature_2m_max, temperature_2m_min, precipitation_sum, windspeed_10m_max) %>%
group_by(year) %>%
summarize(across(-c(precipitation_sum), .fns=mean),
across(c(precipitation_sum), .fns=sum),
.groups="drop"
) %>%
pivot_longer(-c(year)) %>%
ggplot(aes(x=year, y=value)) +
geom_line(aes(group=1)) +
facet_wrap(~tmpMapNames[name], scales="free_y") +
labs(x=NULL, y=NULL, title="Annual averages for key metrics (Phoenix 1960-2023)")
Averages by year for select categorical variables are plotted:
tmpMapNames <- c("wc"="1. Weather Type",
"winddir"="2. Predominant Wind Direction"
)
tmpDFPlot <- dfDailyPHX %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb),
year=year(date),
winddir=case_when(winddirection_10m_dominant>360~"Invalid",
winddirection_10m_dominant>=315~"1. N",
winddirection_10m_dominant>=225~"2. W",
winddirection_10m_dominant>=135~"3. S",
winddirection_10m_dominant>=45~"4. E",
winddirection_10m_dominant>=0~"1. N",
TRUE~"Invalid"
),
wc=case_when(weathercode==0~"1. Clear",
weathercode %in% c(1, 2, 3)~"2. Dry",
weathercode %in% c(51, 53, 55)~"3. Drizzle",
weathercode %in% c(61, 63, 65)~"4. Rain",
weathercode %in% c(71, 73, 75)~"5. Snow",
TRUE~"Error"
)
) %>%
select(year, wc, winddir) %>%
pivot_longer(-c(year)) %>%
count(year, name, value)
tmpPlotFN <- function(x) {
p1 <- tmpDFPlot %>%
filter(name==x) %>%
ggplot(aes(x=year, y=n)) +
geom_line(aes(group=value, color=value), lwd=1) +
labs(x=NULL,
y=NULL,
title=paste0(tmpMapNames[x], " (Phoenix 1960-2023)")
) +
scale_color_discrete(NULL) +
theme(legend.position = "bottom")
return(p1)
}
gridExtra::grid.arrange(tmpPlotFN("wc"), tmpPlotFN("winddir"), nrow=1)
Boxplots for maximum windspeed are created:
dfDailyPHX %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(month, windspeed_10m_max) %>%
ggplot(aes(x=month, y=windspeed_10m_max)) +
geom_boxplot(fill="lightblue") +
lims(y=c(0, NA)) +
labs(x=NULL,
y="Maximum daily wind speed (kph)",
title="Maximum daily windspeed by month (Phoenix 1960-2023)"
)
Boxplots for precipitation are created:
dfDailyPHX %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(month, precipitation_hours) %>%
ggplot(aes(x=month, y=precipitation_hours)) +
geom_boxplot(fill="lightblue") +
lims(y=c(0, NA)) +
labs(x=NULL,
y="Daily precipitation hours",
title="Daily precipitation hours by month (Phoenix 1960-2023)"
)
dfDailyPHX %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(month, precipitation_sum) %>%
ggplot(aes(x=month, y=precipitation_sum)) +
geom_boxplot(fill="lightblue") +
lims(y=c(0, NA)) +
labs(x=NULL,
y="Daily precipitation (mm)",
title="Daily precipitation by month (Phoenix 1960-2023)"
)
Boxplots for temperature are created:
dfDailyPHX %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(month, temperature_2m_max) %>%
ggplot(aes(x=month, y=temperature_2m_max)) +
geom_boxplot(fill="lightblue") +
labs(x=NULL,
y="Daily high temperature (C)",
title="Daily high temperature by month (Phoenix 1960-2023)"
)
dfDailyPHX %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(month, temperature_2m_min) %>%
ggplot(aes(x=month, y=temperature_2m_min)) +
geom_boxplot(fill="lightblue") +
labs(x=NULL,
y="Daily low temperature (C)",
title="Daily low temperature by month (Phoenix 1960-2023)"
)
ACF and PACF are explored for maximum temperature:
acfTemp <- acf(dfDailyPHX$temperature_2m_max, lag.max=1000)
as.vector(acfTemp$acf) %>% findPeaks(width=21) %>% which()
## [1] 368 731
as.vector(acfTemp$acf) %>% findPeaks(width=21, FUN=min) %>% which()
## [1] 184 546 913
pacfTemp <- pacf(dfDailyPHX$temperature_2m_max, lag.max=1000)
pacf(dfDailyPHX$temperature_2m_max, lag.max=50)
As expected, ACF has a sustained seasonal pattern, with peaks (at roughly intervals of 365) and valleys (at roughly intervals of 365 offset by roughly 185) corresponding to the 365-day year
ACF and PACF are explored for precipitation:
acfPrecip <- acf(dfDailyPHX$precipitation_sum, lag.max=1000)
acf(dfDailyPHX$precipitation_sum, lag.max=50)
as.vector(acfPrecip$acf) %>% findPeaks(width=21) %>% which()
## [1] 20 40 82 105 138 154 171 192 219 230 252 271 294 324 350 368 387 398 425
## [20] 466 478 500 516 542 560 597 620 632 653 686 730 750 761 777 809 845 866 886
## [39] 899 920 941 964
as.vector(acfPrecip$acf) %>% findPeaks(width=21, FUN=min) %>% which()
## [1] 27 48 73 91 119 152 186 205 229 247 267 283 300 318 329 340 365 380 406
## [20] 420 445 486 505 547 566 584 624 645 663 674 700 715 742 754 768 803 819 842
## [39] 856 872 890 909 929 972
pacfPrecip <- pacf(dfDailyPHX$precipitation_sum, lag.max=1000)
pacf(dfDailyPHX$precipitation_sum, lag.max=50)
Precipitation by contrast has little seasonality and mostly just a slight correlation to the previous day’s value
ACF and PACF are explored for wind speed:
acfWind <- acf(dfDailyPHX$windspeed_10m_max, lag.max=1000)
acf(dfDailyPHX$windspeed_10m_max, lag.max=50)
as.vector(acfWind$acf) %>% findPeaks(width=21) %>% which()
## [1] 70 163 194 217 357 481 553 577 600 688 716 744 776 895 923 935 947 974
as.vector(acfWind$acf) %>% findPeaks(width=21, FUN=min) %>% which()
## [1] 26 166 182 197 240 253 365 400 489 521 542 556 585 722 768 890 917 932
pacfWind <- pacf(dfDailyPHX$windspeed_10m_max, lag.max=1000)
pacf(dfDailyPHX$windspeed_10m_max, lag.max=50)
Similar to temperature, wind speed appears to have a sustained seasonal component, though peak correlations are lower in magnitude (~0.20 for wind speed vs. ~0.75 for temperature)
A boxplot for delta daily temperature is created:
dfDailyPHX %>%
mutate(month=factor(month.abb[month(date)], levels=month.abb),
chg=temperature_2m_max-lag(temperature_2m_max)
) %>%
filter(!is.na(chg)) %>%
ggplot(aes(x=month, y=chg)) +
geom_boxplot() +
labs(x=NULL, y="Daily change in maximum temperature")
Daily temperature changes are generally larger in magnitude during Spring
A boxplot for delta daily wind speed is created:
dfDailyPHX %>%
mutate(month=factor(month.abb[month(date)], levels=month.abb),
chg=windspeed_10m_max-lag(windspeed_10m_max)
) %>%
filter(!is.na(chg)) %>%
ggplot(aes(x=month, y=chg)) +
geom_boxplot() +
labs(x=NULL, y="Daily change in maximum wind speed")
Daily wind speed changes are generally similar in magnitude by season
A boxplot for delta daily precipitation is created:
dfDailyPHX %>%
mutate(month=factor(month.abb[month(date)], levels=month.abb),
chg=precipitation_sum-lag(precipitation_sum)
) %>%
filter(!is.na(chg)) %>%
ggplot(aes(x=month, y=chg)) +
geom_boxplot() +
labs(x=NULL, y="Daily change in precipitation")
Most days have no precipitation. The corresponding boxplot for precipitation change has small boxes and whiskers with many outliers
Averages for temperature, precipitation, and wind speed are calculated by day of year, using a 21-day rolling window:
df_r21 <- dfDailyPHX %>%
mutate(doy=pmin(yday(date), 365)) %>%
helperRollingAgg(origVar="temperature_2m_max", newName="temp_r21", k=21) %>%
helperRollingAgg(origVar="windspeed_10m_max", newName="wind_r21", k=21) %>%
helperRollingAgg(origVar="precipitation_sum", newName="precip_r21", k=21) %>%
select(date, doy, contains("_r21"))
df_r21 %>%
na.omit() %>%
group_by(doy) %>%
summarize(across(where(is.numeric), .fns=mean), n=n()) %>%
pivot_longer(cols=-c(doy)) %>%
ggplot(aes(x=doy, y=value)) +
geom_line(aes(group=name, color=name)) +
facet_wrap(~name, scales="free_y") +
labs(x="Day of Year", y="Rolling-21 mean") +
theme(legend.position="none")
Outliers are likely much more important in driving average precipitation than in driving average temperature
Standard deviations for temperature, precipitation, and wind speed are calculated by day of year, using a 21-day rolling window:
df_r21_sd <- dfDailyPHX %>%
mutate(doy=pmin(yday(date), 365)) %>%
group_by(doy) %>%
mutate(across(.cols=c(temperature_2m_max, windspeed_10m_max, precipitation_sum), .fns=sd)) %>%
ungroup() %>%
helperRollingAgg(origVar="temperature_2m_max", newName="temp_r21", k=21) %>%
helperRollingAgg(origVar="windspeed_10m_max", newName="wind_r21", k=21) %>%
helperRollingAgg(origVar="precipitation_sum", newName="precip_r21", k=21) %>%
select(date, doy, contains("_r21"))
df_r21_sd %>%
na.omit() %>%
group_by(doy) %>%
summarize(across(where(is.numeric), .fns=mean)) %>%
pivot_longer(cols=-c(doy)) %>%
ggplot(aes(x=doy, y=value)) +
geom_line(aes(group=name, color=name)) +
facet_wrap(~name, scales="free_y") +
labs(x="Day of Year", y="Rolling-21 mean of daily standard deviation") +
theme(legend.position="none")
Means and standard deviations are plotted together:
df_r21 %>%
bind_rows(df_r21_sd, .id="src") %>%
na.omit() %>%
mutate(musig=c("1"="Mean", "2"="SD")[src]) %>%
group_by(doy, musig) %>%
summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
pivot_longer(cols=-c(doy, musig)) %>%
pivot_wider(id_cols=c(doy, name), names_from="musig") %>%
ggplot(aes(x=doy)) +
geom_line(aes(y=Mean, group=name, color=name), lwd=2) +
geom_ribbon(aes(ymin=Mean-SD, ymax=Mean+SD, fill=name), alpha=0.5) +
facet_wrap(~name, scales="free_y") +
labs(x="Day of Year",
y="Rolling-21 mean +/- daily standard deviation",
title="Rolling 21-day mean +/- one rolling 21-day sd"
) +
theme(legend.position="none")
Means and approximate SEM are plotted together:
nYears <- length(unique(year(df_r21$date)))
nYears
## [1] 64
df_r21 %>%
bind_rows(df_r21_sd, .id="src") %>%
na.omit() %>%
mutate(musig=c("1"="Mean", "2"="SD")[src]) %>%
group_by(doy, musig) %>%
summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
pivot_longer(cols=-c(doy, musig)) %>%
pivot_wider(id_cols=c(doy, name), names_from="musig") %>%
ggplot(aes(x=doy)) +
geom_line(aes(y=Mean, group=name, color=name), lwd=2) +
geom_ribbon(aes(ymin=Mean-SD/sqrt(nYears-1), ymax=Mean+SD/sqrt(nYears-1), fill=name), alpha=0.5) +
facet_wrap(~name, scales="free_y") +
labs(x="Day of Year",
y="Rolling-21 mean +/- 1 SEM (approx)",
title="Rolling 21-day mean +/- 1 SEM (approx)"
) +
theme(legend.position="none")
Consistent with previous analyses, temperature has a strong seasonal pattern with differences in mean that vastly exceed the standard error of the mean. Wind is more similar to temperature in having seasonal variations that meaningfully exceed SEM in magnitude. Of interest, the seasonal spikiness in precipitation results in many times of the year that are outside of one standard error of the mean of each other. This is in contrast to cities explored previously where SEM is typically greater than change in means by time of year
The download process is converted to functional form:
helperNewCityDailyDownload <- function(cityName,
tz,
abb,
fName=paste0("testOM_daily_", abb, ".json"),
idx=1:nrow(tblMetricsDaily),
startDate="1960-01-01",
endDate="2023-12-31"
) {
testURLDaily <- helperOpenMeteoURL(cityName=cityName,
dailyIndices=idx,
startDate=startDate,
endDate=endDate,
tz=tz
)
cat("\nDownload URL:\n")
print(testURLDaily)
# Download file
if(!file.exists(fName)) {
fileDownload(fileName=fName, url=testURLDaily)
} else {
cat("\nFile", fName, "already exists, skipping download\n")
}
}
The function is run for a new city:
helperNewCityDailyDownload(cityName="Kansas City MO", tz = "US/Central", abb="mci")
##
## Daily metrics created from indices: weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration
##
##
## Download URL:
## [1] "https://archive-api.open-meteo.com/v1/archive?latitude=39.12&longitude=-94.55&start_date=1960-01-01&end_date=2023-12-31&daily=weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration&timezone=US%2FCentral"
##
## File testOM_daily_mci.json already exists, skipping download
Creation of an analysis file is converted to functional form:
createDailyDF <- function(fName=NULL, abb=NULL, glimpseFOMJ=FALSE, glimpseList=FALSE, glimpseDF=TRUE) {
# Create fName from abb if fName not passed
if(is.null(fName)) {
if(is.null(abb)) stop("\nMust provide abb or fName to function createDailyDF()\n")
else fName <- paste0("testOM_daily_", abb, ".json")
}
# Read daily JSON file
lstDaily <- formatOpenMeteoJSON(fName, glimpseData=glimpseFOMJ)
# Sample records of tibble inside list
if(isTRUE(glimpseList)) lstDaily$tblDaily %>% glimpse()
# Convert variables to proper type
dfDaily <- lstDaily$tblDaily %>%
mutate(weathercode=factor(weathercode),
sunrise_chr=sunrise,
sunset_chr=sunset,
sunrise=lubridate::ymd_hm(sunrise),
sunset=lubridate::ymd_hm(sunset),
fct_winddir=factor(winddirection_10m_dominant)
)
# Sample records of final tibble
if(isTRUE(glimpseDF)) glimpse(dfDaily)
# Return final tibble
dfDaily
}
The function is tested:
dfDaily <- createDailyDF(abb="mci")
##
## Objects in JSON include: latitude, longitude, generationtime_ms, utc_offset_seconds, timezone, timezone_abbreviation, elevation, daily_units, daily
##
## Rows: 23,376
## Columns: 21
## $ date <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode <fct> 71, 3, 3, 3, 3, 0, 3, 2, 3, 51, 55, 63, 51,…
## $ temperature_2m_max <dbl> 6.4, 5.5, -3.1, -0.6, -5.1, 5.6, 7.3, 8.6, …
## $ temperature_2m_min <dbl> 0.8, -4.8, -9.6, -12.6, -10.6, -6.6, -1.1, …
## $ apparent_temperature_max <dbl> 1.0, 0.3, -8.7, -5.9, -9.1, 0.2, 3.2, 4.9, …
## $ apparent_temperature_min <dbl> -4.5, -11.6, -15.3, -17.6, -16.7, -12.1, -6…
## $ precipitation_sum <dbl> 0.3, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0…
## $ rain_sum <dbl> 0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0…
## $ snowfall_sum <dbl> 0.14, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
## $ precipitation_hours <dbl> 2, 0, 0, 0, 0, 0, 0, 0, 0, 1, 7, 16, 1, 12,…
## $ sunrise <dttm> 1960-01-01 08:37:00, 1960-01-02 08:37:00, …
## $ sunset <dttm> 1960-01-01 18:05:00, 1960-01-02 18:06:00, …
## $ windspeed_10m_max <dbl> 25.9, 24.6, 22.4, 14.0, 16.2, 18.8, 19.1, 1…
## $ windgusts_10m_max <dbl> 53.3, 49.7, 47.5, 29.9, 35.6, 36.7, 36.7, 2…
## $ winddirection_10m_dominant <int> 153, 263, 306, 268, 346, 221, 232, 171, 193…
## $ shortwave_radiation_sum <dbl> 2.66, 9.49, 9.46, 10.03, 6.87, 10.38, 10.13…
## $ et0_fao_evapotranspiration <dbl> 0.95, 1.25, 1.03, 0.88, 0.72, 1.56, 1.39, 1…
## $ sunrise_chr <chr> "1960-01-01T08:37", "1960-01-02T08:37", "19…
## $ sunset_chr <chr> "1960-01-01T18:05", "1960-01-02T18:06", "19…
## $ fct_winddir <fct> 153, 263, 306, 268, 346, 221, 232, 171, 193…
dfDaily
## # A tibble: 23,376 × 21
## date time weathercode temperature_2m_max temperature_2m_min
## <date> <chr> <fct> <dbl> <dbl>
## 1 1960-01-01 1960-01-01 71 6.4 0.8
## 2 1960-01-02 1960-01-02 3 5.5 -4.8
## 3 1960-01-03 1960-01-03 3 -3.1 -9.6
## 4 1960-01-04 1960-01-04 3 -0.6 -12.6
## 5 1960-01-05 1960-01-05 3 -5.1 -10.6
## 6 1960-01-06 1960-01-06 0 5.6 -6.6
## 7 1960-01-07 1960-01-07 3 7.3 -1.1
## 8 1960-01-08 1960-01-08 2 8.6 -2.7
## 9 1960-01-09 1960-01-09 3 11.8 1.6
## 10 1960-01-10 1960-01-10 51 5.9 4
## # ℹ 23,366 more rows
## # ℹ 16 more variables: apparent_temperature_max <dbl>,
## # apparent_temperature_min <dbl>, precipitation_sum <dbl>, rain_sum <dbl>,
## # snowfall_sum <dbl>, precipitation_hours <dbl>, sunrise <dttm>,
## # sunset <dttm>, windspeed_10m_max <dbl>, windgusts_10m_max <dbl>,
## # winddirection_10m_dominant <int>, shortwave_radiation_sum <dbl>,
## # et0_fao_evapotranspiration <dbl>, sunrise_chr <chr>, sunset_chr <chr>, …
A function is written to plot continuous variable averages by month:
plotContVarMean <- function(df,
tmpMapNames=c("precipitation_sum"="3. Precipitation (mm)",
"windspeed_10m_max"="4. Windspeed (kph)",
"temperature_2m_max"="1. High Temperature (C)",
"temperature_2m_min"="2. Low Temperature (C)"
),
idxMapSum=c(1),
isMonthly=TRUE,
titleStarter=if(isTRUE(isMonthly)) "Monthly" else "Annual",
titleLoc="",
titleEnder=paste0("(", titleLoc, if(str_length(titleLoc)>0) " ", "1960-2023)"),
printPlot=TRUE,
returnPlot=!isTRUE(printPlot)
) {
# FUNCTION ARGUMENTS:
# df: tibble or data frame from createDailyDF()
# tmpMapNames: variables to summarize in format c("variablename"="variable description")
# idxMapSum: indices of the variables in tmpMapNames thatshould be summed rather than averaged
# isMonthly: boolean with TRUE being monthly and FALSE being annual
# titleStarter: opening phrase for title
# titleLoc: location to be used in parentheses at end of title ("" means just show year range)
# titleEnder: ending phrase for title
# printPlot: boolean, should the plot be printed?
# returnPlot: boolean, should the plot be returned?
p1 <- df %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(all_of(c("year", if(isTRUE(isMonthly)) "month", names(tmpMapNames)))) %>%
group_by(pick(if(isTRUE(isMonthly)) c("year", "month") else c("year"))) %>%
summarize(across(-all_of(names(tmpMapNames)[idxMapSum]), .fns=mean),
across(all_of(names(tmpMapNames)[idxMapSum]), .fns=sum),
.groups="drop"
)
if(isTRUE(isMonthly)) {
p1 <- p1 %>%
group_by(month) %>%
summarize(across(-c(year), .fns=mean)) %>%
pivot_longer(-c(month))
} else {
p1 <- p1 %>%
pivot_longer(-c(year))
}
p1 <- p1 %>%
ggplot(aes(x=.data[[if(isTRUE(isMonthly)) "month" else "year"]], y=value)) +
geom_line(aes(group=1)) +
facet_wrap(~tmpMapNames[name], scales="free_y") +
labs(x=NULL,
y=NULL,
title=paste0(titleStarter, " averages for key metrics ", titleEnder)
)
if(isTRUE(printPlot)) print(p1)
if(isTRUE(returnPlot)) p1
}
The function is tested for select continuous variables:
plotContVarMean(dfDaily, isMonthly=TRUE, titleLoc="Kansas City, MO")
plotContVarMean(dfDaily, isMonthly=FALSE, titleLoc="Kansas City, MO")
A function is written to plot continuous variable averages by month:
plotCatVarMean <- function(df,
tmpMapNames=c("wc"="1. Weather Type",
"winddir"="2. Predominant Wind Direction"
),
idxMapSum=c(1),
isMonthly=TRUE,
titleStarter=if(isTRUE(isMonthly)) "Monthly" else "Annual",
titleLoc="",
titleEnder=paste0("(", titleLoc, if(str_length(titleLoc)>0) " ", "1960-2023)"),
printPlot=TRUE,
returnPlot=!isTRUE(printPlot)
) {
tmpDFPlot <- df %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb),
year=year(date),
winddir=case_when(winddirection_10m_dominant>360~"Invalid",
winddirection_10m_dominant>=315~"1. N",
winddirection_10m_dominant>=225~"2. W",
winddirection_10m_dominant>=135~"3. S",
winddirection_10m_dominant>=45~"4. E",
winddirection_10m_dominant>=0~"1. N",
TRUE~"Invalid"
),
wc=case_when(weathercode==0~"1. Clear",
weathercode %in% c(1, 2, 3)~"2. Dry",
weathercode %in% c(51, 53, 55)~"3. Drizzle",
weathercode %in% c(61, 63, 65)~"4. Rain",
weathercode %in% c(71, 73, 75)~"5. Snow",
TRUE~"Error"
)
) %>%
select(all_of(c(if(isTRUE(isMonthly)) "month" else "year", names(tmpMapNames))))
if(isTRUE(isMonthly)) {
tmpDFPlot <- tmpDFPlot %>%
pivot_longer(-c(month)) %>%
count(month, name, value)
} else {
tmpDFPlot <- tmpDFPlot %>%
pivot_longer(-c(year)) %>%
count(year, name, value)
}
tmpPlotFN <- function(x) {
p1 <- tmpDFPlot %>%
filter(name==x) %>%
ggplot(aes(x=.data[[if(isTRUE(isMonthly)) "month" else "year"]], y=n)) +
geom_line(aes(group=value, color=value), lwd=2) +
labs(x=NULL,
y=NULL,
title=paste0(titleStarter, " average for ", tmpMapNames[x], " ", titleEnder)
) +
scale_color_discrete(NULL) +
theme(legend.position = "bottom")
return(p1)
}
# Update to be more flexible
grid::grid.newpage()
p1 <- gridExtra::arrangeGrob(tmpPlotFN("wc"), tmpPlotFN("winddir"), nrow=1)
if(isTRUE(printPlot)) grid::grid.draw(p1)
if(isTRUE(returnPlot)) p1
}
The function is tested for select categorical variables:
plotCatVarMean(dfDaily, isMonthly=TRUE, titleLoc="Kansas City, MO")
plotCatVarMean(dfDaily, isMonthly=FALSE, titleLoc="Kansas City, MO")
A function is written for creating weather boxplots:
omCreateBoxPlot <- function(df,
keyVar,
ymin=NA,
ymax=NA,
mapDesc=c("windspeed_10m_max"="Maximum wind speed (kph)",
"precipitation_hours"="Precipitation (hours)",
"precipitation_sum"="Precipitation (mm)",
"temperature_2m_max"="Maximum Temperature (C)",
"temperature_2m_min"="Minimum Temperature (C)"
),
keyVarDesc=if(keyVar %in% names(mapDesc)) mapDesc[keyVar] else "Key Variable",
titleLoc="",
titleEnder=paste0("(", titleLoc, if(str_length(titleLoc)>0) " ", "1960-2023)")
) {
p1 <- df %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(all_of(c("month", keyVar))) %>%
ggplot(aes(x=month, y=.data[[keyVar]])) +
geom_boxplot(fill="lightblue") +
labs(x=NULL,
y=keyVarDesc,
title=paste0(keyVarDesc, " by month ", titleEnder)
)
if(!is.na(ymin) | !is.na(ymax)) p1 <- p1 + lims(y=c(ymin, ymax))
print(p1)
}
The function is tested for select variables:
omCreateBoxPlot(dfDaily, keyVar="windspeed_10m_max", ymin=0, titleLoc="Kansas City, MO")
omCreateBoxPlot(dfDaily, keyVar="precipitation_hours", ymin=0, titleLoc="Kansas City, MO")
omCreateBoxPlot(dfDaily, keyVar="precipitation_sum", ymin=0, titleLoc="Kansas City, MO")
omCreateBoxPlot(dfDaily, keyVar="temperature_2m_max", titleLoc="Kansas City, MO")
omCreateBoxPlot(dfDaily, keyVar="temperature_2m_min", titleLoc="Kansas City, MO")
A function is created for running ACF/PACF:
omACFPACF <- function(df, keyVar, bigLag=1000, smallLag=50, smallACF=FALSE, smallPACF=TRUE) {
# Big lag for ACF
acfTemp <- acf(df %>% pull(keyVar), lag.max=bigLag, main=paste0("ACF: ", keyVar))
# Peaks for ACF
cat("\nACF peaks\n")
as.vector(acfTemp$acf) %>% findPeaks(width=21) %>% which() %>% print()
# Troughs for ACF
cat("\nACF troughs\n")
as.vector(acfTemp$acf) %>% findPeaks(width=21, FUN=min) %>% which() %>% print()
# Small lag for ACF (if requested)
if(isTRUE(smallACF)) acfTemp <- acf(df %>% pull(keyVar), lag.max=smallLag, main=paste0("ACF: ", keyVar))
# Big lag for PACF
pacfTemp <- pacf(df %>% pull(keyVar), lag.max=bigLag, main=paste0("PACF: ", keyVar))
# Small lag for PACF (if requested)
if(isTRUE(smallPACF)) pacfTemp <- pacf(df %>% pull(keyVar), lag.max=smallLag, main=paste0("PACF: ", keyVar))
}
The function is tested for select variables:
omACFPACF(dfDaily, keyVar="temperature_2m_max")
##
## ACF peaks
## [1] 369 726
##
## ACF troughs
## [1] 180 547 915
omACFPACF(dfDaily, keyVar="precipitation_sum", smallACF=TRUE)
##
## ACF peaks
## [1] 23 34 53 67 87 115 147 166 187 207 226 243 262 277 288 349 361 374 385
## [20] 404 441 459 481 504 533 561 579 600 620 635 650 673 685 696 712 723 741 756
## [39] 767 785 820 836 881 913 925 948 962 974
##
## ACF troughs
## [1] 24 36 59 76 91 111 129 144 158 184 224 251 269 280 305 319 334 371 389
## [20] 402 420 462 485 508 531 550 563 585 604 628 658 678 731 743 777 799 823 842
## [39] 862 875 893 916 936 955 968 982
omACFPACF(dfDaily, keyVar="windspeed_10m_max", smallACF=TRUE)
##
## ACF peaks
## [1] 140 158 177 202 214 238 373 462 477 489 511 533 561 580 612 628 722 735 788
## [20] 848 883 900 919 933 952 964 982
##
## ACF troughs
## [1] 125 142 160 180 208 221 240 319 486 506 547 589 624 638 840 855 875 897 910
## [20] 925 948 966 989
Function omCreateBoxPlot() is updated to allow for converting a variable to a change:
omCreateBoxPlot <- function(df,
keyVar,
chgLag=NULL,
ymin=NA,
ymax=NA,
mapDesc=c("windspeed_10m_max"="Maximum wind speed (kph)",
"precipitation_hours"="Precipitation (hours)",
"precipitation_sum"="Precipitation (mm)",
"temperature_2m_max"="Maximum Temperature (C)",
"temperature_2m_min"="Minimum Temperature (C)"
),
keyVarDesc=if(keyVar %in% names(mapDesc)) mapDesc[keyVar] else "Key Variable",
titleLoc="",
titleEnder=paste0("(", titleLoc, if(str_length(titleLoc)>0) " ", "1960-2023)")
) {
p1 <- df %>%
mutate(month=factor(month(date), levels=1:12, labels=month.abb), year=year(date)) %>%
select(all_of(c("month", keyVar)))
if(!is.null(chgLag))
p1 <- p1 %>%
mutate(across(all_of(keyVar), .fns=function(x) x - lag(x, chgLag))) %>%
filter(if_all(all_of(keyVar), .fns=function(x) !is.na(x)))
p1 <- p1 %>%
ggplot(aes(x=month, y=.data[[keyVar]])) +
geom_boxplot(fill="lightblue") +
labs(x=NULL,
y=keyVarDesc,
title=paste0(if(is.null(chgLag)) "" else "Change in ", keyVarDesc, " by month ", titleEnder)
)
if(!is.na(ymin) | !is.na(ymax)) p1 <- p1 + lims(y=c(ymin, ymax))
print(p1)
}
The updated function is tested for select variables:
omCreateBoxPlot(dfDaily, keyVar="temperature_2m_max", chgLag=1, titleLoc="Kansas City, MO")
omCreateBoxPlot(dfDaily, keyVar="windspeed_10m_max", chgLag=1, titleLoc="Kansas City, MO")
omCreateBoxPlot(dfDaily, keyVar="precipitation_sum", chgLag=1, titleLoc="Kansas City, MO")
A function is written to calculate means and standard deviations:
omDailyMeanSD <- function(df,
rollVars,
rollK,
mapDesc=c("windspeed_10m_max"="windMax",
"precipitation_hours"="precipHours",
"precipitation_sum"="precipSum",
"temperature_2m_max"="tempMax",
"temperature_2m_min"="tempMin"
),
addSuffix=TRUE,
makeSEM=FALSE,
printPlot=TRUE,
returnPlot=!isTRUE(printPlot)
) {
if(isTRUE(addSuffix)) for(x in names(mapDesc)) mapDesc[x] <- paste0(mapDesc[x], "_r", rollK)
# Calculate means
df_r21 <- df %>% mutate(doy=pmin(yday(date), 365))
for(x in rollVars) {
df_r21 <- df_r21 %>%
helperRollingAgg(origVar=x, newName=unname(mapDesc[x]), k=rollK)
}
df_r21 <- df_r21 %>% select(all_of(c("date", "doy", unname(mapDesc[rollVars]))))
# Calculate standard deviations
df_r21_sd <- df %>%
mutate(doy=pmin(yday(date), 365)) %>%
group_by(doy) %>%
mutate(across(.cols=all_of(rollVars), .fns=sd)) %>%
ungroup()
for(x in rollVars) {
df_r21_sd <- df_r21_sd %>%
helperRollingAgg(origVar=x, newName=unname(mapDesc[x]), k=rollK)
}
df_r21_sd <- df_r21_sd %>% select(all_of(c("date", "doy", unname(mapDesc[rollVars]))))
# Create plot of means and standard deviations or standard errors of the mean
if(isTRUE(makeSEM)) divBy <- sqrt(length(unique(year(df_r21$date)))-1)
else divBy <- 1
p1 <- df_r21 %>%
bind_rows(df_r21_sd, .id="src") %>%
na.omit() %>%
mutate(musig=c("1"="Mean", "2"="SD")[src]) %>%
group_by(doy, musig) %>%
summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
pivot_longer(cols=-c(doy, musig)) %>%
pivot_wider(id_cols=c(doy, name), names_from="musig") %>%
ggplot(aes(x=doy)) +
geom_line(aes(y=Mean, group=name, color=name), lwd=2) +
geom_ribbon(aes(ymin=Mean-SD/divBy, ymax=Mean+SD/divBy, fill=name), alpha=0.5) +
facet_wrap(~name, scales="free_y") +
labs(x="Day of Year",
y=paste0("Rolling ",
rollK,
"-day mean +/- 1 ",
if(isTRUE(makeSEM)) "SEM (approx)" else "sd"
),
title=paste0("Rolling ",
rollK,
"-day mean +/- 1 rolling ",
rollK,
"-day ",
if(isTRUE(makeSEM)) "SEM (approx)" else "sd"
)
) +
theme(legend.position="none")
if(isTRUE(printPlot)) print(p1)
if(isTRUE(returnPlot)) return(p1)
}
The new function is tested for select variables:
# Standard deviation only
omDailyMeanSD(dfDaily,
rollK=21,
rollVars=c("temperature_2m_max", "windspeed_10m_max", "precipitation_sum")
)
# SEM (approx)
omDailyMeanSD(dfDaily,
rollK=21,
rollVars=c("temperature_2m_max", "windspeed_10m_max", "precipitation_sum"),
makeSEM=TRUE
)
The functions are run in combination for a new city:
# 0. Declare abbreviation and city name to be downloaded
abbCity <- "bna"
tzCity <- "US/Central"
dbNameCity <- "Nashville TN"
useNameCity <- dbNameCity
# 1. Download data
helperNewCityDailyDownload(cityName=dbNameCity, tz = tzCity, abb=abbCity)
##
## Daily metrics created from indices: weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration
##
##
## Download URL:
## [1] "https://archive-api.open-meteo.com/v1/archive?latitude=36.17&longitude=-86.78&start_date=1960-01-01&end_date=2023-12-31&daily=weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration&timezone=US%2FCentral"
##
## File testOM_daily_bna.json already exists, skipping download
# 2. Load and process data
dfDaily <- createDailyDF(abb=abbCity)
##
## Objects in JSON include: latitude, longitude, generationtime_ms, utc_offset_seconds, timezone, timezone_abbreviation, elevation, daily_units, daily
##
## Rows: 23,376
## Columns: 21
## $ date <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode <fct> 71, 63, 53, 3, 75, 75, 73, 3, 3, 51, 51, 55…
## $ temperature_2m_max <dbl> 8.6, 8.8, 7.4, 3.7, 2.3, 1.3, 4.0, 6.3, 8.3…
## $ temperature_2m_min <dbl> -1.4, 5.7, 0.0, -1.1, -1.3, -0.8, -0.2, -4.…
## $ apparent_temperature_max <dbl> 4.5, 7.2, 5.1, -0.2, -0.5, -1.2, 0.1, 3.9, …
## $ apparent_temperature_min <dbl> -5.5, 1.1, -3.9, -4.5, -4.7, -4.2, -4.3, -8…
## $ precipitation_sum <dbl> 0.4, 13.9, 0.9, 0.0, 12.3, 6.2, 1.8, 0.0, 0…
## $ rain_sum <dbl> 0.0, 13.9, 0.9, 0.0, 0.0, 0.6, 0.7, 0.0, 0.…
## $ snowfall_sum <dbl> 0.28, 0.00, 0.00, 0.00, 8.68, 3.92, 0.77, 0…
## $ precipitation_hours <dbl> 4, 20, 2, 0, 15, 18, 8, 0, 0, 3, 5, 4, 9, 1…
## $ sunrise <dttm> 1960-01-01 07:58:00, 1960-01-02 07:58:00, …
## $ sunset <dttm> 1960-01-01 17:42:00, 1960-01-02 17:43:00, …
## $ windspeed_10m_max <dbl> 16.8, 19.2, 17.7, 9.4, 9.2, 8.7, 14.3, 11.6…
## $ windgusts_10m_max <dbl> 30.2, 50.8, 37.4, 24.8, 18.4, 17.6, 26.6, 2…
## $ winddirection_10m_dominant <int> 122, 183, 294, 258, 25, 30, 238, 211, 183, …
## $ shortwave_radiation_sum <dbl> 5.85, 1.70, 6.13, 6.70, 2.12, 3.83, 10.38, …
## $ et0_fao_evapotranspiration <dbl> 1.07, 0.42, 0.87, 0.75, 0.31, 0.40, 0.86, 1…
## $ sunrise_chr <chr> "1960-01-01T07:58", "1960-01-02T07:58", "19…
## $ sunset_chr <chr> "1960-01-01T17:42", "1960-01-02T17:43", "19…
## $ fct_winddir <fct> 122, 183, 294, 258, 25, 30, 238, 211, 183, …
dfDaily
## # A tibble: 23,376 × 21
## date time weathercode temperature_2m_max temperature_2m_min
## <date> <chr> <fct> <dbl> <dbl>
## 1 1960-01-01 1960-01-01 71 8.6 -1.4
## 2 1960-01-02 1960-01-02 63 8.8 5.7
## 3 1960-01-03 1960-01-03 53 7.4 0
## 4 1960-01-04 1960-01-04 3 3.7 -1.1
## 5 1960-01-05 1960-01-05 75 2.3 -1.3
## 6 1960-01-06 1960-01-06 75 1.3 -0.8
## 7 1960-01-07 1960-01-07 73 4 -0.2
## 8 1960-01-08 1960-01-08 3 6.3 -4.8
## 9 1960-01-09 1960-01-09 3 8.3 -3.5
## 10 1960-01-10 1960-01-10 51 14 4.4
## # ℹ 23,366 more rows
## # ℹ 16 more variables: apparent_temperature_max <dbl>,
## # apparent_temperature_min <dbl>, precipitation_sum <dbl>, rain_sum <dbl>,
## # snowfall_sum <dbl>, precipitation_hours <dbl>, sunrise <dttm>,
## # sunset <dttm>, windspeed_10m_max <dbl>, windgusts_10m_max <dbl>,
## # winddirection_10m_dominant <int>, shortwave_radiation_sum <dbl>,
## # et0_fao_evapotranspiration <dbl>, sunrise_chr <chr>, sunset_chr <chr>, …
# 3. Plot means for continuous variables
plotContVarMean(dfDaily, isMonthly=TRUE, titleLoc=useNameCity)
plotContVarMean(dfDaily, isMonthly=FALSE, titleLoc=useNameCity)
# 4. Plot means for categorical variables
plotCatVarMean(dfDaily, isMonthly=TRUE, titleLoc=useNameCity)
plotCatVarMean(dfDaily, isMonthly=FALSE, titleLoc=useNameCity)
# 5. Create boxplots for select variables
omCreateBoxPlot(dfDaily, keyVar="windspeed_10m_max", ymin=0, titleLoc=useNameCity)
omCreateBoxPlot(dfDaily, keyVar="precipitation_hours", ymin=0, titleLoc=useNameCity)
omCreateBoxPlot(dfDaily, keyVar="precipitation_sum", ymin=0, titleLoc=useNameCity)
omCreateBoxPlot(dfDaily, keyVar="temperature_2m_max", titleLoc=useNameCity)
omCreateBoxPlot(dfDaily, keyVar="temperature_2m_min", titleLoc=useNameCity)
# 6. Analyze ACF/PACF for select variables
omACFPACF(dfDaily, keyVar="temperature_2m_max")
##
## ACF peaks
## [1] 365 736
##
## ACF troughs
## [1] 180 549 914
omACFPACF(dfDaily, keyVar="precipitation_sum", smallACF=TRUE)
##
## ACF peaks
## [1] 15 34 45 80 114 142 156 186 206 227 240 262 275 292 313 343 357 372 386
## [20] 405 417 445 469 483 495 525 572 591 605 626 654 677 689 714 728 767 791 803
## [39] 826 838 856 884 902 931 951 975
##
## ACF troughs
## [1] 26 57 84 95 130 162 180 197 221 232 255 284 305 317 333 355 374 395 419
## [20] 430 446 471 503 521 535 548 568 584 608 642 667 682 700 712 726 761 785 808
## [39] 821 845 859 881 898 912 934 948 960 972
omACFPACF(dfDaily, keyVar="windspeed_10m_max", smallACF=TRUE)
##
## ACF peaks
## [1] 203 349 362 382 552 723 737 901 939
##
## ACF troughs
## [1] 144 174 189 359 535 550 892 904 924
# 7. Create boxplots for select variables (with lag-1, daily difference)
omCreateBoxPlot(dfDaily, keyVar="temperature_2m_max", chgLag=1, titleLoc=useNameCity)
omCreateBoxPlot(dfDaily, keyVar="windspeed_10m_max", chgLag=1, titleLoc=useNameCity)
omCreateBoxPlot(dfDaily, keyVar="precipitation_sum", chgLag=1, titleLoc=useNameCity)
# 8. Create daily mean +/- 1 SD and/or SEM for select variables
omDailyMeanSD(dfDaily, rollK=21, rollVars=c("temperature_2m_max", "windspeed_10m_max", "precipitation_sum"))
omDailyMeanSD(dfDaily, rollK=21, rollVars=c("temperature_2m_max",
"windspeed_10m_max",
"precipitation_sum"
),
makeSEM=TRUE
)
A function is written to run all steps for a new city:
omRunAllSteps <- function(dfDaily=NULL,
dlData=FALSE,
abbCity=NULL,
tzCity=NULL,
useNameCity="New city",
dbNameCity=useNameCity,
returnDF=FALSE,
runStats=TRUE,
mainStatVars=c("temperature_2m_max", "windspeed_10m_max", "precipitation_sum"),
xtraStatVars=c("temperature_2m_min", "precipitation_hours"),
minZeroVars=c("windspeed_10m_max", "precipitation_sum", "precipitation_hours")
) {
# 1. Download data (if requested)
if(isTRUE(dlData)) helperNewCityDailyDownload(cityName=dbNameCity, tz=tzCity, abb=abbCity)
# 2. Load and process data (if not passed)
if(is.null(dfDaily)) dfDaily <- createDailyDF(abb=abbCity)
# Stop processing if requested
if(!isTRUE(runStats)) {
if(isTRUE(returnDF)) {
return(dfDaily)
} else {
return(NULL)
}
}
# 3. Plot means for continuous variables
plotContVarMean(dfDaily, isMonthly=TRUE, titleLoc=useNameCity)
plotContVarMean(dfDaily, isMonthly=FALSE, titleLoc=useNameCity)
# 4. Plot means for categorical variables
plotCatVarMean(dfDaily, isMonthly=TRUE, titleLoc=useNameCity)
plotCatVarMean(dfDaily, isMonthly=FALSE, titleLoc=useNameCity)
# 5. Create boxplots for select variables
purrr::walk(.x=c(mainStatVars, xtraStatVars),
.f=function(x) omCreateBoxPlot(dfDaily,
keyVar=x,
ymin=if(x %in% minZeroVars) 0 else NA,
titleLoc=useNameCity
)
)
# 6. Analyze ACF/PACF for select variables
purrr::walk(.x=mainStatVars,
.f=function(x) omACFPACF(dfDaily, keyVar=x, smallACF=TRUE)
)
# 7. Create boxplots for select variables (with lag-1, daily difference)
purrr::walk(.x=mainStatVars,
.f=function(x) omCreateBoxPlot(dfDaily, keyVar=x, chgLag=1, titleLoc=useNameCity)
)
# 8. Create daily mean +/- 1 SD and/or SEM for select variables
omDailyMeanSD(dfDaily, rollK=21, rollVars=mainStatVars)
omDailyMeanSD(dfDaily, rollK=21, rollVars=mainStatVars, makeSEM=TRUE)
# Return file if requested
if(isTRUE(returnDF)) return(dfDaily)
}
The function is tested solely for loading a dataset:
dfDailyMCI <- omRunAllSteps(abbCity="mci", returnDF=TRUE, runStats=FALSE)
##
## Objects in JSON include: latitude, longitude, generationtime_ms, utc_offset_seconds, timezone, timezone_abbreviation, elevation, daily_units, daily
##
## Rows: 23,376
## Columns: 21
## $ date <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode <fct> 71, 3, 3, 3, 3, 0, 3, 2, 3, 51, 55, 63, 51,…
## $ temperature_2m_max <dbl> 6.4, 5.5, -3.1, -0.6, -5.1, 5.6, 7.3, 8.6, …
## $ temperature_2m_min <dbl> 0.8, -4.8, -9.6, -12.6, -10.6, -6.6, -1.1, …
## $ apparent_temperature_max <dbl> 1.0, 0.3, -8.7, -5.9, -9.1, 0.2, 3.2, 4.9, …
## $ apparent_temperature_min <dbl> -4.5, -11.6, -15.3, -17.6, -16.7, -12.1, -6…
## $ precipitation_sum <dbl> 0.3, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0…
## $ rain_sum <dbl> 0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0…
## $ snowfall_sum <dbl> 0.14, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
## $ precipitation_hours <dbl> 2, 0, 0, 0, 0, 0, 0, 0, 0, 1, 7, 16, 1, 12,…
## $ sunrise <dttm> 1960-01-01 08:37:00, 1960-01-02 08:37:00, …
## $ sunset <dttm> 1960-01-01 18:05:00, 1960-01-02 18:06:00, …
## $ windspeed_10m_max <dbl> 25.9, 24.6, 22.4, 14.0, 16.2, 18.8, 19.1, 1…
## $ windgusts_10m_max <dbl> 53.3, 49.7, 47.5, 29.9, 35.6, 36.7, 36.7, 2…
## $ winddirection_10m_dominant <int> 153, 263, 306, 268, 346, 221, 232, 171, 193…
## $ shortwave_radiation_sum <dbl> 2.66, 9.49, 9.46, 10.03, 6.87, 10.38, 10.13…
## $ et0_fao_evapotranspiration <dbl> 0.95, 1.25, 1.03, 0.88, 0.72, 1.56, 1.39, 1…
## $ sunrise_chr <chr> "1960-01-01T08:37", "1960-01-02T08:37", "19…
## $ sunset_chr <chr> "1960-01-01T18:05", "1960-01-02T18:06", "19…
## $ fct_winddir <fct> 153, 263, 306, 268, 346, 221, 232, 171, 193…
The function is tested solely for plots on an existing dataset:
omRunAllSteps(dfDaily=dfDailyMCI, useNameCity="Kansas City MO")
##
## ACF peaks
## [1] 369 726
##
## ACF troughs
## [1] 180 547 915
##
## ACF peaks
## [1] 140 158 177 202 214 238 373 462 477 489 511 533 561 580 612 628 722 735 788
## [20] 848 883 900 919 933 952 964 982
##
## ACF troughs
## [1] 125 142 160 180 208 221 240 319 486 506 547 589 624 638 840 855 875 897 910
## [20] 925 948 966 989
##
## ACF peaks
## [1] 23 34 53 67 87 115 147 166 187 207 226 243 262 277 288 349 361 374 385
## [20] 404 441 459 481 504 533 561 579 600 620 635 650 673 685 696 712 723 741 756
## [39] 767 785 820 836 881 913 925 948 962 974
##
## ACF troughs
## [1] 24 36 59 76 91 111 129 144 158 184 224 251 269 280 305 319 334 371 389
## [20] 402 420 462 485 508 531 550 563 585 604 628 658 678 731 743 777 799 823 842
## [39] 862 875 893 916 936 955 968 982
The function is tested solely for downloading new data:
omRunAllSteps(dlData=TRUE,
abbCity="msp",
tzCity="US/Central",
useNameCity="Minneapolis MN",
runStats=FALSE
)
##
## Daily metrics created from indices: weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration
##
##
## Download URL:
## [1] "https://archive-api.open-meteo.com/v1/archive?latitude=44.96&longitude=-93.27&start_date=1960-01-01&end_date=2023-12-31&daily=weathercode,temperature_2m_max,temperature_2m_min,apparent_temperature_max,apparent_temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,sunrise,sunset,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,shortwave_radiation_sum,et0_fao_evapotranspiration&timezone=US%2FCentral"
##
## Objects in JSON include: latitude, longitude, generationtime_ms, utc_offset_seconds, timezone, timezone_abbreviation, elevation, daily_units, daily
##
## Rows: 23,376
## Columns: 21
## $ date <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode <fct> 73, 73, 3, 71, 3, 3, 73, 71, 71, 3, 51, 51,…
## $ temperature_2m_max <dbl> -2.3, -1.6, -8.3, -13.4, -14.6, -2.8, 1.6, …
## $ temperature_2m_min <dbl> -5.2, -7.1, -21.4, -27.0, -28.0, -12.4, -3.…
## $ apparent_temperature_max <dbl> -7.7, -4.2, -14.4, -19.6, -21.7, -7.5, -4.1…
## $ apparent_temperature_min <dbl> -10.2, -13.2, -27.3, -32.8, -33.9, -19.2, -…
## $ precipitation_sum <dbl> 6.2, 6.2, 0.0, 0.3, 0.0, 0.0, 1.5, 0.5, 0.3…
## $ rain_sum <dbl> 0.0, 0.1, 0.0, 0.0, 0.0, 0.0, 0.1, 0.0, 0.0…
## $ snowfall_sum <dbl> 4.34, 4.27, 0.00, 0.21, 0.00, 0.00, 0.98, 0…
## $ precipitation_hours <dbl> 12, 15, 0, 3, 0, 0, 9, 3, 2, 0, 5, 4, 1, 0,…
## $ sunrise <dttm> 1960-01-01 08:51:00, 1960-01-02 08:51:00, …
## $ sunset <dttm> 1960-01-01 17:41:00, 1960-01-02 17:42:00, …
## $ windspeed_10m_max <dbl> 18.7, 19.9, 19.5, 20.2, 25.8, 21.4, 25.0, 2…
## $ windgusts_10m_max <dbl> 37.1, 40.7, 39.2, 40.7, 49.7, 40.0, 47.5, 4…
## $ winddirection_10m_dominant <int> 126, 322, 297, 244, 240, 245, 261, 299, 160…
## $ shortwave_radiation_sum <dbl> 2.19, 3.71, 6.54, 6.30, 7.99, 5.99, 6.70, 7…
## $ et0_fao_evapotranspiration <dbl> 0.39, 0.39, 0.34, 0.26, 0.26, 0.42, 0.62, 0…
## $ sunrise_chr <chr> "1960-01-01T08:51", "1960-01-02T08:51", "19…
## $ sunset_chr <chr> "1960-01-01T17:41", "1960-01-02T17:42", "19…
## $ fct_winddir <fct> 126, 322, 297, 244, 240, 245, 261, 299, 160…
## NULL
The function is tested for creating a dataset from multiple previously processed cities:
omCityMap <- c("atl"="Atlanta GA",
"ord"="Chicago IL",
"dtw"="Detroit MI",
"msy"="New Orleans LA",
"den"="Denver CO"
)
dfMultiCity <- map_dfr(.x=names(omCityMap),
.f=function(x) omRunAllSteps(dlData=FALSE, runStats=FALSE, abbCity=x, returnDF=TRUE),
.id="src"
) %>%
mutate(cityName=unname(omCityMap)[as.integer(src)])
##
## Objects in JSON include: latitude, longitude, generationtime_ms, utc_offset_seconds, timezone, timezone_abbreviation, elevation, daily_units, daily
##
## Rows: 23,376
## Columns: 21
## $ date <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode <fct> 71, 61, 63, 53, 51, 53, 61, 2, 3, 3, 3, 3, …
## $ temperature_2m_max <dbl> 2.6, 11.4, 14.7, 7.3, 7.1, 7.3, 7.4, 10.7, …
## $ temperature_2m_min <dbl> 0.1, 1.5, 2.0, -0.1, 0.2, 4.5, 3.1, 0.4, -0…
## $ apparent_temperature_max <dbl> -2.3, 9.2, 13.1, 2.8, 4.4, 6.1, 4.9, 6.9, 1…
## $ apparent_temperature_min <dbl> -4.0, -3.1, -2.7, -4.7, -3.3, 1.4, -0.5, -4…
## $ precipitation_sum <dbl> 3.0, 7.8, 7.3, 1.1, 0.6, 5.8, 8.7, 0.0, 0.0…
## $ rain_sum <dbl> 2.5, 7.8, 7.3, 1.1, 0.6, 5.8, 8.7, 0.0, 0.0…
## $ snowfall_sum <dbl> 0.35, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
## $ precipitation_hours <dbl> 13, 21, 5, 3, 3, 17, 8, 0, 0, 0, 0, 0, 0, 5…
## $ sunrise <dttm> 1960-01-01 07:42:00, 1960-01-02 07:42:00, …
## $ sunset <dttm> 1960-01-01 17:39:00, 1960-01-02 17:40:00, …
## $ windspeed_10m_max <dbl> 20.9, 18.7, 23.8, 15.5, 9.2, 11.4, 19.2, 17…
## $ windgusts_10m_max <dbl> 39.6, 41.4, 56.2, 44.6, 33.1, 41.8, 39.2, 3…
## $ winddirection_10m_dominant <int> 90, 106, 295, 318, 2, 69, 288, 313, 140, 24…
## $ shortwave_radiation_sum <dbl> 2.38, 1.83, 10.18, 9.37, 3.88, 1.77, 5.34, …
## $ et0_fao_evapotranspiration <dbl> 0.57, 0.38, 1.34, 1.36, 0.65, 0.33, 0.76, 1…
## $ sunrise_chr <chr> "1960-01-01T07:42", "1960-01-02T07:42", "19…
## $ sunset_chr <chr> "1960-01-01T17:39", "1960-01-02T17:40", "19…
## $ fct_winddir <fct> 90, 106, 295, 318, 2, 69, 288, 313, 140, 24…
##
## Objects in JSON include: latitude, longitude, generationtime_ms, utc_offset_seconds, timezone, timezone_abbreviation, elevation, daily_units, daily
##
## Rows: 23,376
## Columns: 21
## $ date <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode <fct> 3, 51, 3, 2, 3, 1, 3, 3, 51, 51, 61, 63, 53…
## $ temperature_2m_max <dbl> 1.6, 4.9, -0.6, -5.9, -6.1, 1.0, 4.9, 1.6, …
## $ temperature_2m_min <dbl> -3.5, -0.9, -7.5, -11.8, -11.2, -10.3, -3.2…
## $ apparent_temperature_max <dbl> -4.3, -0.6, -6.9, -13.1, -13.6, -4.4, -1.3,…
## $ apparent_temperature_min <dbl> -7.9, -7.0, -14.4, -18.4, -17.7, -17.9, -8.…
## $ precipitation_sum <dbl> 0.0, 1.7, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.6…
## $ rain_sum <dbl> 0.0, 1.7, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.6…
## $ snowfall_sum <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
## $ precipitation_hours <dbl> 0, 11, 0, 0, 0, 0, 0, 0, 6, 1, 5, 24, 5, 10…
## $ sunrise <dttm> 1960-01-01 07:18:00, 1960-01-02 07:18:00, …
## $ sunset <dttm> 1960-01-01 16:29:00, 1960-01-02 16:30:00, …
## $ windspeed_10m_max <dbl> 22.7, 27.6, 27.0, 29.1, 29.6, 32.1, 32.7, 3…
## $ windgusts_10m_max <dbl> 40.0, 52.6, 44.6, 50.8, 48.2, 52.6, 56.5, 5…
## $ winddirection_10m_dominant <int> 142, 214, 268, 247, 261, 232, 234, 275, 185…
## $ shortwave_radiation_sum <dbl> 7.45, 2.25, 4.58, 8.66, 9.09, 8.79, 5.86, 8…
## $ et0_fao_evapotranspiration <dbl> 0.95, 0.66, 1.06, 1.06, 1.04, 1.33, 1.23, 1…
## $ sunrise_chr <chr> "1960-01-01T07:18", "1960-01-02T07:18", "19…
## $ sunset_chr <chr> "1960-01-01T16:29", "1960-01-02T16:30", "19…
## $ fct_winddir <fct> 142, 214, 268, 247, 261, 232, 234, 275, 185…
##
## Objects in JSON include: latitude, longitude, generationtime_ms, utc_offset_seconds, timezone, timezone_abbreviation, elevation, daily_units, daily
##
## Rows: 23,376
## Columns: 21
## $ date <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode <fct> 3, 55, 51, 3, 3, 3, 3, 51, 51, 51, 3, 63, 6…
## $ temperature_2m_max <dbl> 0.4, 3.2, 2.7, -3.3, -6.0, -0.8, 4.1, 2.5, …
## $ temperature_2m_min <dbl> -4.4, -2.5, -2.9, -8.4, -9.7, -10.7, -4.9, …
## $ apparent_temperature_max <dbl> -4.4, -1.1, -1.8, -9.2, -13.1, -7.0, -3.1, …
## $ apparent_temperature_min <dbl> -8.1, -7.4, -8.9, -14.9, -15.2, -18.0, -9.6…
## $ precipitation_sum <dbl> 0.0, 7.6, 0.4, 0.0, 0.0, 0.0, 0.0, 0.4, 0.3…
## $ rain_sum <dbl> 0.0, 7.6, 0.4, 0.0, 0.0, 0.0, 0.0, 0.4, 0.3…
## $ snowfall_sum <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
## $ precipitation_hours <dbl> 0, 17, 4, 0, 0, 0, 0, 4, 3, 1, 0, 17, 6, 4,…
## $ sunrise <dttm> 1960-01-01 08:01:00, 1960-01-02 08:01:00, …
## $ sunset <dttm> 1960-01-01 17:10:00, 1960-01-02 17:10:00, …
## $ windspeed_10m_max <dbl> 15.0, 21.5, 25.6, 23.0, 28.7, 30.3, 30.9, 3…
## $ windgusts_10m_max <dbl> 30.2, 42.5, 48.2, 45.0, 54.4, 54.4, 57.6, 6…
## $ winddirection_10m_dominant <int> 109, 196, 261, 247, 257, 239, 223, 272, 161…
## $ shortwave_radiation_sum <dbl> 6.09, 0.93, 4.91, 4.91, 8.07, 6.47, 4.41, 6…
## $ et0_fao_evapotranspiration <dbl> 0.55, 0.33, 0.81, 0.96, 1.04, 1.07, 1.01, 1…
## $ sunrise_chr <chr> "1960-01-01T08:01", "1960-01-02T08:01", "19…
## $ sunset_chr <chr> "1960-01-01T17:10", "1960-01-02T17:10", "19…
## $ fct_winddir <fct> 109, 196, 261, 247, 257, 239, 223, 272, 161…
##
## Objects in JSON include: latitude, longitude, generationtime_ms, utc_offset_seconds, timezone, timezone_abbreviation, elevation, daily_units, daily
##
## Rows: 23,376
## Columns: 21
## $ date <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode <fct> 61, 63, 51, 3, 51, 63, 51, 3, 3, 51, 3, 3, …
## $ temperature_2m_max <dbl> 15.5, 19.1, 17.2, 12.1, 17.9, 18.2, 10.3, 1…
## $ temperature_2m_min <dbl> 10.9, 12.3, 8.6, 6.4, 11.3, 10.3, 6.1, 4.6,…
## $ apparent_temperature_max <dbl> 13.6, 20.8, 15.8, 9.2, 20.1, 20.2, 7.7, 9.1…
## $ apparent_temperature_min <dbl> 6.6, 9.8, 4.1, 2.7, 8.0, 7.3, 2.5, 1.7, 4.8…
## $ precipitation_sum <dbl> 7.1, 18.5, 0.3, 0.0, 0.7, 26.0, 0.1, 0.0, 0…
## $ rain_sum <dbl> 7.1, 18.5, 0.3, 0.0, 0.7, 26.0, 0.1, 0.0, 0…
## $ snowfall_sum <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ precipitation_hours <dbl> 12, 12, 2, 0, 7, 19, 1, 0, 0, 1, 0, 0, 0, 8…
## $ sunrise <dttm> 1960-01-01 07:55:00, 1960-01-02 07:55:00, …
## $ sunset <dttm> 1960-01-01 18:10:00, 1960-01-02 18:11:00, …
## $ windspeed_10m_max <dbl> 28.9, 18.8, 29.8, 16.2, 21.0, 19.7, 22.7, 1…
## $ windgusts_10m_max <dbl> 53.6, 36.4, 50.0, 27.7, 34.6, 59.4, 39.2, 2…
## $ winddirection_10m_dominant <int> 69, 125, 345, 40, 81, 352, 324, 66, 87, 106…
## $ shortwave_radiation_sum <dbl> 7.86, 4.91, 13.14, 13.63, 4.33, 1.05, 10.67…
## $ et0_fao_evapotranspiration <dbl> 1.05, 0.79, 2.22, 2.17, 1.14, 0.28, 1.30, 1…
## $ sunrise_chr <chr> "1960-01-01T07:55", "1960-01-02T07:55", "19…
## $ sunset_chr <chr> "1960-01-01T18:10", "1960-01-02T18:11", "19…
## $ fct_winddir <fct> 69, 125, 345, 40, 81, 352, 324, 66, 87, 106…
##
## Objects in JSON include: latitude, longitude, generationtime_ms, utc_offset_seconds, timezone, timezone_abbreviation, elevation, daily_units, daily
##
## Rows: 23,376
## Columns: 21
## $ date <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode <fct> 75, 3, 1, 3, 1, 3, 1, 3, 3, 3, 3, 51, 73, 7…
## $ temperature_2m_max <dbl> -4.5, -5.7, -7.4, -1.9, -1.5, 3.4, 5.4, 9.6…
## $ temperature_2m_min <dbl> -11.0, -19.2, -22.5, -22.8, -20.8, -16.0, -…
## $ apparent_temperature_max <dbl> -8.9, -10.0, -11.9, -6.2, -5.5, -0.3, 0.5, …
## $ apparent_temperature_min <dbl> -15.8, -24.5, -28.3, -28.8, -26.4, -21.4, -…
## $ precipitation_sum <dbl> 8.4, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0…
## $ rain_sum <dbl> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0…
## $ snowfall_sum <dbl> 5.88, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
## $ precipitation_hours <dbl> 13, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 3, 17,…
## $ sunrise <dttm> 1960-01-01 08:20:00, 1960-01-02 08:20:00, …
## $ sunset <dttm> 1960-01-01 17:45:00, 1960-01-02 17:46:00, …
## $ windspeed_10m_max <dbl> 22.1, 9.8, 13.3, 14.5, 14.3, 13.4, 16.1, 17…
## $ windgusts_10m_max <dbl> 36.0, 15.8, 21.2, 23.4, 21.2, 22.3, 25.9, 2…
## $ winddirection_10m_dominant <int> 15, 156, 154, 149, 175, 196, 166, 182, 188,…
## $ shortwave_radiation_sum <dbl> 6.21, 10.29, 10.29, 11.02, 10.95, 10.69, 10…
## $ et0_fao_evapotranspiration <dbl> 0.51, 0.73, 0.71, 0.89, 0.90, 1.17, 1.26, 1…
## $ sunrise_chr <chr> "1960-01-01T08:20", "1960-01-02T08:20", "19…
## $ sunset_chr <chr> "1960-01-01T17:45", "1960-01-02T17:46", "19…
## $ fct_winddir <fct> 15, 156, 154, 149, 175, 196, 166, 182, 188,…
dfMultiCity %>% count(cityName)
## # A tibble: 5 × 2
## cityName n
## <chr> <int>
## 1 Atlanta GA 23376
## 2 Chicago IL 23376
## 3 Denver CO 23376
## 4 Detroit MI 23376
## 5 New Orleans LA 23376
Mean and SEM are plotted for temperature for each city, as well as the total:
nYears <- length(unique(year(dfMultiCity$date)))
divBy <- sqrt(nYears-1)
dfMultiCity %>%
select(date, temperature_2m_max, cityName) %>%
bind_rows(mutate(., cityName="Overall")) %>%
group_by(cityName, date) %>%
summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
arrange(cityName, date) %>%
mutate(doy=pmin(yday(date), 365)) %>%
group_by(cityName) %>%
helperRollingAgg(origVar="temperature_2m_max", newName="mu_r21", k=21) %>%
group_by(cityName, doy) %>%
mutate(sd=sd(temperature_2m_max)) %>%
group_by(cityName) %>%
helperRollingAgg(origVar="sd", newName="sd_r21", k=21) %>%
ungroup() %>%
select(cityName, doy, mu_r21, sd_r21) %>%
pivot_longer(cols=-c(cityName, doy)) %>%
na.omit() %>%
group_by(cityName, doy, name) %>%
summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
pivot_wider(id_cols=c(cityName, doy), names_from="name") %>%
ggplot(aes(x=doy)) +
geom_line(data=~filter(., cityName!="Overall"),
aes(y=mu_r21, group=cityName, color=cityName),
lwd=2
) +
geom_ribbon(data=~filter(., cityName!="Overall"),
aes(ymin=mu_r21-sd_r21/divBy, ymax=mu_r21+sd_r21/divBy, fill=cityName),
alpha=0.5
) +
labs(x="Day of Year",
y=paste0("Rolling ", 21, "-day mean +/- 1 ", if(isTRUE(TRUE)) "SEM (approx)" else "sd"),
title=paste0("Rolling ",
21,
"-day mean +/- 1 rolling ",
21,
"-day ",
if(isTRUE(TRUE)) "SEM (approx)" else "sd", "\nTemperature Max (C)")
) +
scale_color_discrete(NULL) +
scale_fill_discrete(NULL) +
geom_line(data=~filter(., cityName=="Overall"), aes(y=mu_r21, group=cityName), lwd=1, lty=2)
Mean and SEM are plotted for precipitation for each city, as well as the total:
nYears <- length(unique(year(dfMultiCity$date)))
divBy <- sqrt(nYears-1)
dfMultiCity %>%
select(date, precipitation_sum, cityName) %>%
bind_rows(mutate(., cityName="Overall")) %>%
group_by(cityName, date) %>%
summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
arrange(cityName, date) %>%
mutate(doy=pmin(yday(date), 365)) %>%
group_by(cityName) %>%
helperRollingAgg(origVar="precipitation_sum", newName="mu_r21", k=21) %>%
group_by(cityName, doy) %>%
mutate(sd=sd(precipitation_sum)) %>%
group_by(cityName) %>%
helperRollingAgg(origVar="sd", newName="sd_r21", k=21) %>%
ungroup() %>%
select(cityName, doy, mu_r21, sd_r21) %>%
pivot_longer(cols=-c(cityName, doy)) %>%
na.omit() %>%
group_by(cityName, doy, name) %>%
summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
pivot_wider(id_cols=c(cityName, doy), names_from="name") %>%
ggplot(aes(x=doy)) +
geom_line(data=~filter(., cityName!="Overall"),
aes(y=mu_r21, group=cityName, color=cityName),
lwd=2
) +
geom_ribbon(data=~filter(., cityName!="Overall"),
aes(ymin=mu_r21-sd_r21/divBy, ymax=mu_r21+sd_r21/divBy, fill=cityName),
alpha=0.5
) +
labs(x="Day of Year",
y=paste0("Rolling ", 21, "-day mean +/- 1 ", if(isTRUE(TRUE)) "SEM (approx)" else "sd"),
title=paste0("Rolling ",
21,
"-day mean +/- 1 rolling ",
21,
"-day ",
if(isTRUE(TRUE)) "SEM (approx)" else "sd", "\nPrecipitation (mm)")
) +
scale_color_discrete(NULL) +
scale_fill_discrete(NULL) +
geom_line(data=~filter(., cityName=="Overall"), aes(y=mu_r21, group=cityName), lwd=1, lty=2)
Mean and SEM are plotted for wind speed for each city, as well as the total:
nYears <- length(unique(year(dfMultiCity$date)))
divBy <- sqrt(nYears-1)
dfMultiCity %>%
select(date, windspeed_10m_max, cityName) %>%
bind_rows(mutate(., cityName="Overall")) %>%
group_by(cityName, date) %>%
summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
arrange(cityName, date) %>%
mutate(doy=pmin(yday(date), 365)) %>%
group_by(cityName) %>%
helperRollingAgg(origVar="windspeed_10m_max", newName="mu_r21", k=21) %>%
group_by(cityName, doy) %>%
mutate(sd=sd(windspeed_10m_max)) %>%
group_by(cityName) %>%
helperRollingAgg(origVar="sd", newName="sd_r21", k=21) %>%
ungroup() %>%
select(cityName, doy, mu_r21, sd_r21) %>%
pivot_longer(cols=-c(cityName, doy)) %>%
na.omit() %>%
group_by(cityName, doy, name) %>%
summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
pivot_wider(id_cols=c(cityName, doy), names_from="name") %>%
ggplot(aes(x=doy)) +
geom_line(data=~filter(., cityName!="Overall"),
aes(y=mu_r21, group=cityName, color=cityName),
lwd=2
) +
geom_ribbon(data=~filter(., cityName!="Overall"),
aes(ymin=mu_r21-sd_r21/divBy, ymax=mu_r21+sd_r21/divBy, fill=cityName),
alpha=0.5
) +
labs(x="Day of Year",
y=paste0("Rolling ", 21, "-day mean +/- 1 ", if(isTRUE(TRUE)) "SEM (approx)" else "sd"),
title=paste0("Rolling ",
21,
"-day mean +/- 1 rolling ",
21,
"-day ",
if(isTRUE(TRUE)) "SEM (approx)" else "sd", "\nMax Wind Speed (kph)")
) +
scale_color_discrete(NULL) +
scale_fill_discrete(NULL) +
geom_line(data=~filter(., cityName=="Overall"), aes(y=mu_r21, group=cityName), lwd=1, lty=2)
A function is written to create plots of continuous variables across cities:
omMultiCityCont <- function(df, varName, varDesc, divBy) {
p1 <- df %>%
select(all_of(c("date", varName, "cityName"))) %>%
bind_rows(mutate(., cityName="Overall")) %>%
group_by(cityName, date) %>%
summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
arrange(cityName, date) %>%
mutate(doy=pmin(yday(date), 365)) %>%
group_by(cityName) %>%
helperRollingAgg(origVar=varName, newName="mu_r21", k=21) %>%
group_by(cityName, doy) %>%
mutate(sd=sd(get(varName))) %>%
group_by(cityName) %>%
helperRollingAgg(origVar="sd", newName="sd_r21", k=21) %>%
ungroup() %>%
select(cityName, doy, mu_r21, sd_r21) %>%
pivot_longer(cols=-c(cityName, doy)) %>%
na.omit() %>%
group_by(cityName, doy, name) %>%
summarize(across(where(is.numeric), .fns=mean), .groups="drop") %>%
pivot_wider(id_cols=c(cityName, doy), names_from="name") %>%
ggplot(aes(x=doy)) +
geom_line(data=~filter(., cityName!="Overall"),
aes(y=mu_r21, group=cityName, color=cityName),
lwd=2
) +
geom_ribbon(data=~filter(., cityName!="Overall"),
aes(ymin=mu_r21-sd_r21/divBy, ymax=mu_r21+sd_r21/divBy, fill=cityName),
alpha=0.5
) +
labs(x="Day of Year",
y=paste0("Rolling ", 21, "-day mean +/- 1 ", if(isTRUE(TRUE)) "SEM (approx)" else "sd"),
title=paste0("Rolling ",
21,
"-day mean +/- 1 rolling ",
21,
"-day ",
if(isTRUE(TRUE)) "SEM (approx)" else "sd", "\n", varDesc)
) +
scale_color_discrete(NULL) +
scale_fill_discrete(NULL) +
geom_line(data=~filter(., cityName=="Overall"), aes(y=mu_r21, group=cityName), lwd=1, lty=2)
p1
}
The function is tested on a new variable:
omMultiCityCont(dfMultiCity,
varName="shortwave_radiation_sum",
varDesc="Daily solar radiation (MJ)",
divBy=sqrt(length(unique(year(dfMultiCity$date)))-1)
)
The function is tested on evapotranspiration:
omMultiCityCont(dfMultiCity,
varName="et0_fao_evapotranspiration",
varDesc="Daily sum of ET0 Reference Evapotranspiration",
divBy=sqrt(length(unique(year(dfMultiCity$date)))-1)
)
The function is tested on snowfall:
omMultiCityCont(dfMultiCity,
varName="snowfall_sum",
varDesc="Daily average snowfall (cm)",
divBy=sqrt(length(unique(year(dfMultiCity$date)))-1)
)
The function is tested on apparent temperature:
omMultiCityCont(dfMultiCity,
varName="apparent_temperature_max",
varDesc="Maximum Apparent Temperature (C)",
divBy=sqrt(length(unique(year(dfMultiCity$date)))-1)
)
The function is tested for max-min temperature spread:
omMultiCityCont(dfMultiCity %>% mutate(dTemp=temperature_2m_max-temperature_2m_min),
varName="dTemp",
varDesc="Max-Min Temperature Spread (C)",
divBy=sqrt(length(unique(year(dfMultiCity$date)))-1)
)
The function is tested for daily change in mean temperature:
omMultiCityCont(dfMultiCity %>%
mutate(muTemp=0.5*(temperature_2m_max+temperature_2m_min)) %>%
arrange(cityName, date) %>%
group_by(cityName) %>%
mutate(dTemp=muTemp-lag(muTemp)) %>%
ungroup() %>%
na.omit(),
varName="dTemp",
varDesc="Daily Change in Mean Temperature (C)",
divBy=sqrt(length(unique(year(dfMultiCity$date)))-1)
)
The function is tested on wind gusts:
omMultiCityCont(dfMultiCity,
varName="windgusts_10m_max",
varDesc="Daily Maximum Wind Gusts (kph)",
divBy=sqrt(length(unique(year(dfMultiCity$date)))-1)
)
The function is tested on sunrise:
omMultiCityCont(dfMultiCity %>%
mutate(sr=60*hour(sunrise)+minute(sunrise)),
varName="sr",
varDesc="Sunrise (minutes after midnight, local standard time)",
divBy=sqrt(length(unique(year(dfMultiCity$date)))-1)
)
The function is tested on sunset:
omMultiCityCont(dfMultiCity %>%
mutate(ss=60*hour(sunset)+minute(sunset)),
varName="ss",
varDesc="Sunset (minutes after midnight, local standard time)",
divBy=sqrt(length(unique(year(dfMultiCity$date)))-1)
)
The function is tested on daylight length:
omMultiCityCont(dfMultiCity %>%
mutate(dl=60*hour(sunset)+minute(sunset)-60*hour(sunrise)-minute(sunrise)),
varName="dl",
varDesc="Daylight length (minutes)",
divBy=sqrt(length(unique(year(dfMultiCity$date)))-1)
)
The function is tested for daily change in daylight length:
omMultiCityCont(dfMultiCity %>%
mutate(dl=60*hour(sunset)+minute(sunset)-60*hour(sunrise)-minute(sunrise)) %>%
arrange(cityName, date) %>%
group_by(cityName) %>%
mutate(ddl=dl-lag(dl)) %>%
ungroup() %>%
na.omit(),
varName="ddl",
varDesc="Daily Change in Daylight Length (minutes)",
divBy=sqrt(length(unique(year(dfMultiCity$date)))-1)
)
The function is tested for daily change in sunrise:
omMultiCityCont(dfMultiCity %>%
mutate(dl=60*hour(sunrise)+minute(sunrise)) %>%
arrange(cityName, date) %>%
group_by(cityName) %>%
mutate(ddl=dl-lag(dl)) %>%
ungroup() %>%
na.omit(),
varName="ddl",
varDesc="Daily Change in Sunrise (minutes)",
divBy=sqrt(length(unique(year(dfMultiCity$date)))-1)
)
The function is tested for daily change in sunset:
omMultiCityCont(dfMultiCity %>%
mutate(dl=60*hour(sunset)+minute(sunset)) %>%
arrange(cityName, date) %>%
group_by(cityName) %>%
mutate(ddl=dl-lag(dl)) %>%
ungroup() %>%
na.omit(),
varName="ddl",
varDesc="Daily Change in Sunset (minutes)",
divBy=sqrt(length(unique(year(dfMultiCity$date)))-1)
)
The function is tested on solar noon:
omMultiCityCont(dfMultiCity %>%
mutate(sn=30*hour(sunrise)+0.5*minute(sunrise)+30*hour(sunset)+0.5*minute(sunset)),
varName="sn",
varDesc="Solar noon (minutes after midnight, local standard time)",
divBy=sqrt(length(unique(year(dfMultiCity$date)))-1)
)
The function is tested on change in solar noon:
omMultiCityCont(dfMultiCity %>%
mutate(sn=30*hour(sunrise)+0.5*minute(sunrise)+30*hour(sunset)+0.5*minute(sunset)) %>%
arrange(cityName, date) %>%
group_by(cityName) %>%
mutate(dsn=sn-lag(sn)) %>%
ungroup() %>%
na.omit(),
varName="dsn",
varDesc="Daily change in solar noon (minutes)",
divBy=sqrt(length(unique(year(dfMultiCity$date)))-1)
)
Dominant wind direction is explored:
dfMultiCity %>%
mutate(wd45=(45*round(winddirection_10m_dominant/45))%%360,
fct_month=factor(month.abb[month(date)], levels=month.abb)
) %>%
count(cityName, wd45) %>%
ggplot(aes(x=wd45, y=n)) +
geom_col(fill="lightblue") +
facet_wrap(~cityName) +
coord_polar(start=-pi/8) +
annotate("point", x=0, y=0) +
theme(axis.text.y=element_blank(), axis.ticks.y=element_blank()) +
scale_x_continuous(lim=c(-22.5, 337.5), breaks=seq(0, 360, by=45)) +
labs(title="Predominant daily wind direction by city (1960-2023)",
subtitle="Rounded to nearest 45 degrees",
x=NULL,
y=NULL
)
Dominant wind direction is explored by month:
dfMultiCity %>%
mutate(wd45=(45*round(winddirection_10m_dominant/45))%%360,
fct_month=factor(month.abb[month(date)], levels=month.abb)
) %>%
count(fct_month, wd45) %>%
ggplot(aes(x=wd45, y=n)) +
geom_col(fill="lightblue") +
facet_wrap(~fct_month) +
coord_polar(start=-pi/8) +
annotate("point", x=0, y=0) +
theme(axis.text.y=element_blank(), axis.ticks.y=element_blank()) +
scale_x_continuous(lim=c(-22.5, 337.5), breaks=seq(0, 360, by=45)) +
labs(title="Predominant daily wind direction by month (5 cities, 1960-2023)",
subtitle="Rounded to nearest 45 degrees",
x=NULL,
y=NULL
)
Dominant wind direction is explored by month, by proportion rather than by raw counts:
dfMultiCity %>%
mutate(wd45=(45*round(winddirection_10m_dominant/45))%%360,
fct_month=factor(month.abb[month(date)], levels=month.abb)
) %>%
count(fct_month, wd45) %>%
group_by(fct_month) %>%
mutate(pct=n/sum(n)) %>%
ungroup() %>%
ggplot(aes(x=wd45, y=pct)) +
geom_col(fill="lightblue") +
facet_wrap(~fct_month) +
coord_polar(start=-pi/8) +
annotate("point", x=0, y=0) +
theme(axis.text.y=element_blank(), axis.ticks.y=element_blank()) +
scale_x_continuous(lim=c(-22.5, 337.5), breaks=seq(0, 360, by=45)) +
labs(title="Predominant daily wind direction by month (5 cities, 1960-2023)",
subtitle="Rounded to nearest 45 degrees",
x=NULL,
y=NULL
)
Dominant wind direction is explored by month for a single city:
dfMultiCity %>%
mutate(wd45=(45*round(winddirection_10m_dominant/45))%%360,
fct_month=factor(month.abb[month(date)], levels=month.abb)
) %>%
filter(cityName=="Chicago IL") %>%
count(fct_month, wd45) %>%
group_by(fct_month) %>%
mutate(pct=n/sum(n)) %>%
ungroup() %>%
ggplot(aes(x=wd45, y=pct)) +
geom_col(fill="lightblue") +
facet_wrap(~fct_month) +
coord_polar(start=-pi/8) +
annotate("point", x=0, y=0) +
theme(axis.text.y=element_blank(), axis.ticks.y=element_blank()) +
scale_x_continuous(lim=c(-22.5, 337.5), breaks=seq(0, 360, by=45)) +
labs(title="Predominant daily wind direction by month (Chicago IL, 1960-2023)",
subtitle="Rounded to nearest 45 degrees",
x=NULL,
y=NULL
)
Dominant wind direction is explored by city for a single month:
dfMultiCity %>%
mutate(wd45=(45*round(winddirection_10m_dominant/45))%%360,
fct_month=factor(month.abb[month(date)], levels=month.abb)
) %>%
filter(fct_month=="Jun") %>%
count(cityName, wd45) %>%
group_by(cityName) %>%
mutate(pct=n/sum(n)) %>%
ungroup() %>%
ggplot(aes(x=wd45, y=pct)) +
geom_col(fill="lightblue") +
facet_wrap(~cityName) +
coord_polar(start=-pi/8) +
annotate("point", x=0, y=0) +
theme(axis.text.y=element_blank(), axis.ticks.y=element_blank()) +
scale_x_continuous(lim=c(-22.5, 337.5), breaks=seq(0, 360, by=45)) +
labs(title="Predominant daily wind direction by city (June, 1960-2023)",
subtitle="Rounded to nearest 45 degrees",
x=NULL,
y=NULL
)
Cities are explored for similarity of wind direction:
pctWindDirCity <- dfMultiCity %>%
mutate(wd45=(45*round(winddirection_10m_dominant/45))%%360,
fct_month=factor(month.abb[month(date)], levels=month.abb),
fct_wd45=factor(wd45,
levels=seq(0, 315, by=45),
labels=c("N", "NE", "E", "SE", "S", "SW", "W", "NW")
)
) %>%
count(cityName, fct_wd45) %>%
group_by(cityName) %>%
mutate(pct=n/sum(n)) %>%
ungroup() %>%
select(cityName, fct_wd45, pct) %>%
pivot_wider(id_cols=c("cityName"), names_from="fct_wd45", values_from="pct")
pctWindDirCity
## # A tibble: 5 × 9
## cityName N NE E SE S SW W NW
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Atlanta GA 0.0620 0.0568 0.178 0.117 0.0959 0.114 0.150 0.226
## 2 Chicago IL 0.112 0.129 0.0722 0.0730 0.154 0.201 0.157 0.101
## 3 Denver CO 0.0945 0.0834 0.0802 0.132 0.319 0.158 0.0745 0.0581
## 4 Detroit MI 0.109 0.103 0.0768 0.0689 0.116 0.217 0.180 0.128
## 5 New Orleans LA 0.114 0.100 0.147 0.188 0.151 0.113 0.0974 0.0890
reshape2::melt(as.matrix(pctWindDirCity %>% select(-cityName) %>% dist()),
varnames = c("row", "col")
) %>%
mutate(city1=pctWindDirCity$cityName[row], city2=pctWindDirCity$cityName[col]) %>%
filter(row!=col) %>%
ggplot(aes(x=city1, y=city2)) +
geom_tile(aes(fill=value)) +
scale_fill_continuous("Dist", low="green", high="red") +
geom_text(aes(label=round(value, 2))) +
labs(title="Distance based on predominant daily wind direction\n(1960-2023)",
x=NULL,
y=NULL,
caption="Predominant daily wind rounded to nearest 45 degrees\n(N, NE, E, SE, S, SW, W, NW)"
)
Months are explored for similarity of wind direction:
pctWindDirMonth <- dfMultiCity %>%
mutate(wd45=(45*round(winddirection_10m_dominant/45))%%360,
fct_month=factor(month.abb[month(date)], levels=month.abb),
fct_wd45=factor(wd45,
levels=seq(0, 315, by=45),
labels=c("N", "NE", "E", "SE", "S", "SW", "W", "NW")
)
) %>%
count(fct_month, fct_wd45) %>%
group_by(fct_month) %>%
mutate(pct=n/sum(n)) %>%
ungroup() %>%
select(fct_month, fct_wd45, pct) %>%
pivot_wider(id_cols=c("fct_month"), names_from="fct_wd45", values_from="pct")
pctWindDirMonth
## # A tibble: 12 × 9
## fct_month N NE E SE S SW W NW
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Jan 0.0851 0.0724 0.0884 0.0961 0.170 0.168 0.166 0.155
## 2 Feb 0.109 0.0948 0.0996 0.0938 0.156 0.149 0.144 0.154
## 3 Mar 0.118 0.100 0.106 0.125 0.167 0.128 0.116 0.140
## 4 Apr 0.111 0.0992 0.105 0.139 0.171 0.135 0.111 0.128
## 5 May 0.108 0.108 0.109 0.150 0.169 0.147 0.110 0.0981
## 6 Jun 0.0848 0.102 0.103 0.118 0.176 0.199 0.129 0.0870
## 7 Jul 0.0824 0.0848 0.0803 0.100 0.147 0.225 0.179 0.102
## 8 Aug 0.0877 0.0981 0.121 0.108 0.155 0.199 0.138 0.0935
## 9 Sep 0.104 0.125 0.169 0.126 0.156 0.136 0.0907 0.0934
## 10 Oct 0.114 0.0979 0.145 0.117 0.163 0.130 0.115 0.117
## 11 Nov 0.0949 0.0824 0.107 0.109 0.192 0.149 0.132 0.134
## 12 Dec 0.0822 0.0713 0.0974 0.107 0.187 0.161 0.151 0.143
reshape2::melt(as.matrix(pctWindDirMonth %>% select(-fct_month) %>% dist()),
varnames = c("row", "col")
) %>%
mutate(city1=pctWindDirMonth$fct_month[row], city2=pctWindDirMonth$fct_month[col]) %>%
filter(row!=col) %>%
ggplot(aes(x=city1, y=city2)) +
geom_tile(aes(fill=value)) +
scale_fill_continuous("Dist", low="green", high="red") +
geom_text(aes(label=round(value, 2))) +
labs(title="Distance based on predominant daily wind direction\n(5 cities, 1960-2023)",
x=NULL,
y=NULL,
caption="Predominant daily wind rounded to nearest 45 degrees\n(N, NE, E, SE, S, SW, W, NW)"
)
Month-city is explored for similarity of wind direction:
pctWindDirMonthCity <- dfMultiCity %>%
mutate(wd45=(45*round(winddirection_10m_dominant/45))%%360,
fct_month=factor(month.abb[month(date)], levels=month.abb),
fct_wd45=factor(wd45,
levels=seq(0, 315, by=45),
labels=c("N", "NE", "E", "SE", "S", "SW", "W", "NW")
)
) %>%
count(cityName, fct_month, fct_wd45) %>%
group_by(cityName, fct_month) %>%
mutate(pct=n/sum(n)) %>%
ungroup() %>%
select(cityName, fct_month, fct_wd45, pct) %>%
pivot_wider(id_cols=c("cityName", "fct_month"), names_from="fct_wd45", values_from="pct")
pctWindDirMonthCity
## # A tibble: 60 × 10
## cityName fct_month N NE E SE S SW W NW
## <chr> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Atlanta GA Jan 0.0509 0.0368 0.137 0.0912 0.0917 0.102 0.176 0.315
## 2 Atlanta GA Feb 0.0476 0.0459 0.146 0.0973 0.0918 0.114 0.160 0.296
## 3 Atlanta GA Mar 0.0570 0.0287 0.134 0.122 0.130 0.126 0.139 0.265
## 4 Atlanta GA Apr 0.0427 0.0339 0.108 0.131 0.149 0.155 0.155 0.225
## 5 Atlanta GA May 0.0726 0.0514 0.141 0.139 0.111 0.138 0.163 0.184
## 6 Atlanta GA Jun 0.0682 0.0599 0.158 0.120 0.0990 0.146 0.192 0.156
## 7 Atlanta GA Jul 0.0539 0.0423 0.132 0.0958 0.0847 0.178 0.231 0.182
## 8 Atlanta GA Aug 0.0595 0.0922 0.235 0.119 0.0781 0.124 0.135 0.157
## 9 Atlanta GA Sep 0.0833 0.114 0.326 0.138 0.0568 0.0521 0.0719 0.159
## 10 Atlanta GA Oct 0.0917 0.0751 0.271 0.119 0.0635 0.0570 0.0988 0.224
## # ℹ 50 more rows
reshape2::melt(as.matrix(pctWindDirMonthCity %>% select(-fct_month, -cityName) %>% dist()),
varnames = c("row", "col")
) %>%
mutate(city1=pctWindDirMonthCity$cityName[row],
city2=pctWindDirMonthCity$cityName[col],
month1=pctWindDirMonthCity$fct_month[row],
month2=pctWindDirMonthCity$fct_month[col]
) %>%
tibble::as_tibble() %>%
select(city1, month1, city2, month2, dist=value, everything()) %>%
filter(row<col) %>%
arrange(dist) %>%
filter(row_number()<=10 | row_number()>(max(row_number())-10))
## # A tibble: 20 × 7
## city1 month1 city2 month2 dist row col
## <chr> <fct> <chr> <fct> <dbl> <int> <int>
## 1 Denver CO Jan Denver CO Dec 0.0220 25 36
## 2 Detroit MI Jun Detroit MI Aug 0.0268 42 44
## 3 Atlanta GA Feb Atlanta GA Dec 0.0295 2 12
## 4 Detroit MI Jan Detroit MI Dec 0.0301 37 48
## 5 Denver CO Feb Denver CO Oct 0.0304 26 34
## 6 Atlanta GA Jan Atlanta GA Feb 0.0310 1 2
## 7 New Orleans LA Jan New Orleans LA Dec 0.0313 49 60
## 8 Detroit MI Apr Detroit MI May 0.0325 40 41
## 9 Denver CO Jan Denver CO Nov 0.0329 25 35
## 10 Detroit MI Jul Detroit MI Aug 0.0373 43 44
## 11 Chicago IL Jan Denver CO Dec 0.465 13 36
## 12 Denver CO Jan Detroit MI Jan 0.467 25 37
## 13 Atlanta GA Feb Denver CO Dec 0.472 2 36
## 14 Atlanta GA Oct Denver CO Jan 0.476 10 25
## 15 Atlanta GA Jan Denver CO Jan 0.478 1 25
## 16 Atlanta GA Oct Denver CO Dec 0.482 10 36
## 17 Denver CO Dec Detroit MI Jan 0.483 36 37
## 18 Atlanta GA Sep Denver CO Jan 0.484 9 25
## 19 Atlanta GA Jan Denver CO Dec 0.487 1 36
## 20 Atlanta GA Sep Denver CO Dec 0.489 9 36
Plots of similar and dissimilar city-months are created:
exCityMonth <- c("Atlanta GASep", "Denver CODec", "Detroit MIJan", "Denver COJan")
pctWindDirMonthCity %>%
filter(paste0(cityName, fct_month) %in% exCityMonth) %>%
pivot_longer(cols=-c(cityName, fct_month), names_to="tmp", values_to="pct") %>%
mutate(wd45=case_when(tmp=="N" ~ 0, tmp=="NE" ~ 45, tmp=="E" ~ 90, tmp=="SE" ~ 135,
tmp=="S" ~ 180, tmp=="SW" ~ 225, tmp=="W" ~ 270, tmp=="NW" ~ 315
)
) %>%
ggplot(aes(x=wd45, y=pct)) +
geom_col(fill="lightblue") +
facet_wrap(~paste0(cityName, " ", fct_month)) +
coord_polar(start=-pi/8) +
annotate("point", x=0, y=0) +
theme(axis.text.y=element_blank(), axis.ticks.y=element_blank()) +
scale_x_continuous(lim=c(-22.5, 337.5), breaks=seq(0, 360, by=45)) +
labs(title="Predominant daily wind direction for select city-month (1960-2023)",
subtitle="Rounded to nearest 45 degrees",
x=NULL,
y=NULL
)
The city-month combinations with the most and least common predominant wind directions are explored:
reshape2::melt(as.matrix(pctWindDirMonthCity %>% select(-fct_month, -cityName) %>% dist()),
varnames = c("row", "col")
) %>%
mutate(city1=pctWindDirMonthCity$cityName[row],
city2=pctWindDirMonthCity$cityName[col],
month1=pctWindDirMonthCity$fct_month[row],
month2=pctWindDirMonthCity$fct_month[col]
) %>%
tibble::as_tibble() %>%
select(city1, month1, city2, month2, dist=value, everything()) %>%
filter(row!=col) %>%
mutate(cityMonth=paste0(city1, month1)) %>%
bind_rows(mutate(., cityMonth=paste0(city2, month2))) %>%
group_by(cityMonth) %>%
summarize(muDist=mean(dist), n=n()) %>%
arrange(muDist) %>%
filter(row_number()<=10 | row_number()>(max(row_number())-10))
## # A tibble: 20 × 3
## cityMonth muDist n
## <chr> <dbl> <int>
## 1 Detroit MISep 0.170 118
## 2 Chicago ILSep 0.175 118
## 3 Detroit MIMay 0.176 118
## 4 Detroit MIApr 0.181 118
## 5 Chicago ILOct 0.182 118
## 6 Detroit MIJun 0.185 118
## 7 Detroit MIMar 0.187 118
## 8 Detroit MIOct 0.188 118
## 9 Detroit MIAug 0.188 118
## 10 Chicago ILMar 0.193 118
## 11 New Orleans LAApr 0.266 118
## 12 New Orleans LAMay 0.274 118
## 13 New Orleans LASep 0.277 118
## 14 Atlanta GAJan 0.281 118
## 15 New Orleans LAOct 0.285 118
## 16 Atlanta GAOct 0.289 118
## 17 Atlanta GASep 0.318 118
## 18 Denver CONov 0.332 118
## 19 Denver COJan 0.343 118
## 20 Denver CODec 0.356 118
Cities are explored for similarity of weather code:
pctWMOCity <- dfMultiCity %>%
mutate(fct_month=factor(month.abb[month(date)], levels=month.abb)) %>%
count(cityName, weathercode) %>%
group_by(cityName) %>%
mutate(pct=n/sum(n)) %>%
ungroup() %>%
select(cityName, weathercode, pct) %>%
pivot_wider(id_cols="cityName", names_from="weathercode", values_from="pct")
pctWMOCity
## # A tibble: 5 × 14
## cityName `0` `1` `2` `3` `51` `53` `55` `61` `63` `65`
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Atlanta … 0.0654 0.0479 0.0578 0.324 0.164 0.0750 0.0352 0.0962 0.113 0.0117
## 2 Chicago … 0.0314 0.0335 0.0473 0.374 0.139 0.0636 0.0247 0.0706 0.0815 0.00868
## 3 Denver CO 0.0554 0.0583 0.105 0.389 0.114 0.0503 0.0230 0.0408 0.0261 0.00111
## 4 Detroit … 0.0224 0.0273 0.0389 0.389 0.135 0.0686 0.0306 0.0752 0.0654 0.00398
## 5 New Orle… 0.0783 0.0572 0.0575 0.267 0.183 0.0856 0.0288 0.0882 0.134 0.0190
## # ℹ 3 more variables: `71` <dbl>, `73` <dbl>, `75` <dbl>
reshape2::melt(as.matrix(pctWMOCity %>% select(-cityName) %>% dist()),
varnames = c("row", "col")
) %>%
mutate(city1=pctWMOCity$cityName[row], city2=pctWMOCity$cityName[col]) %>%
filter(row!=col) %>%
ggplot(aes(x=city1, y=city2)) +
geom_tile(aes(fill=value)) +
scale_fill_continuous("Dist", low="green", high="red") +
geom_text(aes(label=round(value, 2))) +
labs(title="Distance based on daily weather codes\n(1960-2023)",
x=NULL,
y=NULL,
caption="WMO weather codes\n(0=clear, 1-3=clouds, 51-55=drizzle, 61-65=rain, 71-75=snow)"
)
Months are explored for similarity of weather code:
pctWMOMonth <- dfMultiCity %>%
mutate(fct_month=factor(month.abb[month(date)], levels=month.abb)) %>%
count(fct_month, weathercode) %>%
group_by(fct_month) %>%
mutate(pct=n/sum(n)) %>%
ungroup() %>%
select(fct_month, weathercode, pct) %>%
pivot_wider(id_cols="fct_month", names_from="weathercode", values_from="pct", values_fill=0)
pctWMOMonth
## # A tibble: 12 × 14
## fct_month `0` `1` `2` `3` `51` `53` `55` `61` `63`
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Jan 0.0469 0.0321 0.0423 0.434 0.0726 0.0382 0.0165 0.0421 0.0456
## 2 Feb 0.0498 0.0269 0.0468 0.426 0.0765 0.0369 0.0146 0.0458 0.0507
## 3 Mar 0.0467 0.0312 0.0466 0.399 0.115 0.0512 0.0215 0.0550 0.0618
## 4 Apr 0.0529 0.0392 0.0512 0.356 0.159 0.0759 0.0289 0.0721 0.0784
## 5 May 0.0388 0.0502 0.0612 0.327 0.182 0.0896 0.0373 0.0967 0.0972
## 6 Jun 0.0280 0.0514 0.0749 0.267 0.191 0.0943 0.0411 0.109 0.132
## 7 Jul 0.0150 0.0433 0.0724 0.230 0.206 0.0990 0.0455 0.118 0.155
## 8 Aug 0.0237 0.0566 0.0747 0.239 0.220 0.0923 0.0383 0.106 0.136
## 9 Sep 0.0670 0.0726 0.0860 0.286 0.190 0.0780 0.0315 0.0838 0.0941
## 10 Oct 0.117 0.0586 0.0785 0.367 0.145 0.0663 0.0251 0.0604 0.0536
## 11 Nov 0.0715 0.0441 0.0536 0.425 0.117 0.055 0.0246 0.0547 0.0527
## 12 Dec 0.0502 0.0312 0.0456 0.433 0.0869 0.0447 0.0159 0.0443 0.0474
## # ℹ 4 more variables: `65` <dbl>, `71` <dbl>, `73` <dbl>, `75` <dbl>
reshape2::melt(as.matrix(pctWMOMonth %>% select(-fct_month) %>% dist()),
varnames = c("row", "col")
) %>%
mutate(city1=pctWMOMonth$fct_month[row], city2=pctWMOMonth$fct_month[col]) %>%
filter(row!=col) %>%
ggplot(aes(x=city1, y=city2)) +
geom_tile(aes(fill=value)) +
scale_fill_continuous("Dist", low="green", high="red") +
geom_text(aes(label=round(value, 2))) +
labs(title="Distance based on daily weather codes\n(1960-2023)",
x=NULL,
y=NULL,
caption="WMO weather codes\n(0=clear, 1-3=clouds, 51-55=drizzle, 61-65=rain, 71-75=snow)"
)
Month-city is explored for similarity of weather code:
pctWMOMonthCity <- dfMultiCity %>%
mutate(fct_month=factor(month.abb[month(date)], levels=month.abb)) %>%
count(cityName, fct_month, weathercode) %>%
group_by(cityName, fct_month) %>%
mutate(pct=n/sum(n)) %>%
ungroup() %>%
select(cityName, fct_month, weathercode, pct) %>%
pivot_wider(id_cols=c("cityName", "fct_month"), names_from="weathercode", values_from="pct", values_fill=0)
pctWMOMonthCity
## # A tibble: 60 × 15
## cityName fct_month `0` `1` `2` `3` `51` `53` `55` `61`
## <chr> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Atlanta GA Jan 0.0655 0.0378 0.0398 0.408 0.107 0.0675 0.0348 0.0887
## 2 Atlanta GA Feb 0.0669 0.0299 0.0492 0.394 0.110 0.0592 0.0321 0.0907
## 3 Atlanta GA Mar 0.0665 0.0378 0.0469 0.391 0.129 0.0620 0.0287 0.0963
## 4 Atlanta GA Apr 0.0828 0.0464 0.0542 0.378 0.138 0.0729 0.0276 0.0807
## 5 Atlanta GA May 0.0469 0.0580 0.0575 0.341 0.170 0.0842 0.0368 0.0887
## 6 Atlanta GA Jun 0.0271 0.0521 0.0609 0.225 0.195 0.101 0.0443 0.130
## 7 Atlanta GA Jul 0.00302 0.0307 0.0605 0.164 0.227 0.102 0.0580 0.153
## 8 Atlanta GA Aug 0.0136 0.0489 0.0660 0.173 0.264 0.0948 0.0489 0.126
## 9 Atlanta GA Sep 0.0589 0.0740 0.0812 0.268 0.228 0.0771 0.0276 0.0865
## 10 Atlanta GA Oct 0.165 0.0660 0.0761 0.348 0.142 0.0570 0.0227 0.0534
## # ℹ 50 more rows
## # ℹ 5 more variables: `63` <dbl>, `65` <dbl>, `71` <dbl>, `73` <dbl>,
## # `75` <dbl>
reshape2::melt(as.matrix(pctWMOMonthCity %>% select(-fct_month, -cityName) %>% dist()),
varnames = c("row", "col")
) %>%
mutate(city1=pctWMOMonthCity$cityName[row],
city2=pctWMOMonthCity$cityName[col],
month1=pctWMOMonthCity$fct_month[row],
month2=pctWMOMonthCity$fct_month[col]
) %>%
tibble::as_tibble() %>%
select(city1, month1, city2, month2, dist=value, everything()) %>%
filter(row<col) %>%
arrange(dist) %>%
filter(row_number()<=10 | row_number()>(max(row_number())-10))
## # A tibble: 20 × 7
## city1 month1 city2 month2 dist row col
## <chr> <fct> <chr> <fct> <dbl> <int> <int>
## 1 Denver CO Jan Denver CO Dec 0.0233 25 36
## 2 Atlanta GA Apr New Orleans LA Jan 0.0236 4 49
## 3 Chicago IL Jan Detroit MI Feb 0.0266 13 38
## 4 Atlanta GA Apr New Orleans LA Dec 0.0271 4 60
## 5 New Orleans LA Jan New Orleans LA Dec 0.0274 49 60
## 6 New Orleans LA Mar New Orleans LA Apr 0.0275 51 52
## 7 Detroit MI Jul Detroit MI Aug 0.0276 43 44
## 8 Chicago IL Mar Detroit MI Mar 0.0279 15 39
## 9 Chicago IL May Detroit MI Jun 0.0286 17 42
## 10 New Orleans LA Jan New Orleans LA Feb 0.0293 49 50
## 11 Chicago IL Feb New Orleans LA Aug 0.548 14 56
## 12 Detroit MI Jan New Orleans LA Aug 0.548 37 56
## 13 Denver CO Nov New Orleans LA Jul 0.549 35 55
## 14 Denver CO Feb New Orleans LA Jul 0.556 26 55
## 15 Denver CO Nov New Orleans LA Aug 0.559 35 56
## 16 Denver CO Feb New Orleans LA Aug 0.565 26 56
## 17 Denver CO Jan New Orleans LA Jul 0.577 25 55
## 18 Denver CO Dec New Orleans LA Jul 0.578 36 55
## 19 Denver CO Jan New Orleans LA Aug 0.588 25 56
## 20 Denver CO Dec New Orleans LA Aug 0.589 36 56
Plots of similar and dissimilar city-months are created:
exCityMonth <- c("New Orleans LAAug", "Denver CODec", "Denver COJan")
pctWMOMonthCity %>%
filter(paste0(cityName, fct_month) %in% exCityMonth) %>%
pivot_longer(cols=-c(cityName, fct_month), names_to="tmp", values_to="pct") %>%
mutate(tmp=factor(tmp)) %>%
ggplot(aes(x=tmp, y=pct)) +
geom_col(fill="lightblue") +
facet_wrap(~paste0(cityName, " ", fct_month)) +
theme(axis.text.y=element_blank(), axis.ticks.y=element_blank()) +
labs(title="Predominant WMO weather code for select city-month (1960-2023)",
x=NULL,
y=NULL,
caption="WMO weather codes\n(0=clear, 1-3=clouds, 51-55=drizzle, 61-65=rain, 71-75=snow)"
)
The city-month combinations with the most and least common predominant WMO weather codes are explored:
reshape2::melt(as.matrix(pctWMOMonthCity %>% select(-fct_month, -cityName) %>% dist()),
varnames = c("row", "col")
) %>%
mutate(city1=pctWMOMonthCity$cityName[row],
city2=pctWMOMonthCity$cityName[col],
month1=pctWMOMonthCity$fct_month[row],
month2=pctWMOMonthCity$fct_month[col]
) %>%
tibble::as_tibble() %>%
select(city1, month1, city2, month2, dist=value, everything()) %>%
filter(row!=col) %>%
mutate(cityMonth=paste0(city1, month1)) %>%
bind_rows(mutate(., cityMonth=paste0(city2, month2))) %>%
group_by(cityMonth) %>%
summarize(muDist=mean(dist), n=n()) %>%
arrange(muDist) %>%
filter(row_number()<=10 | row_number()>(max(row_number())-10))
## # A tibble: 20 × 3
## cityMonth muDist n
## <chr> <dbl> <int>
## 1 Atlanta GAMay 0.154 118
## 2 New Orleans LAFeb 0.159 118
## 3 Detroit MISep 0.159 118
## 4 Atlanta GAApr 0.159 118
## 5 Chicago ILOct 0.160 118
## 6 New Orleans LAJan 0.160 118
## 7 Chicago ILMay 0.161 118
## 8 Chicago ILApr 0.161 118
## 9 New Orleans LADec 0.162 118
## 10 Detroit MIAug 0.163 118
## 11 Chicago ILFeb 0.287 118
## 12 Denver COFeb 0.289 118
## 13 New Orleans LAJun 0.290 118
## 14 Chicago ILJan 0.293 118
## 15 Detroit MIFeb 0.298 118
## 16 Denver COJan 0.302 118
## 17 Denver CODec 0.304 118
## 18 Detroit MIJan 0.318 118
## 19 New Orleans LAJul 0.346 118
## 20 New Orleans LAAug 0.351 118
A random forest is run for predicting month based solely on numeric predictors:
set.seed(25070614)
idxTrain <- sample(1:nrow(dfMultiCity), round(0.75*nrow(dfMultiCity)), replace=FALSE)
rfTestMonth_v001 <- runFullRF(dfTrain=dfMultiCity[idxTrain, ] %>%
mutate(fct_city=factor(cityName),
fct_mon=factor(month.abb[month(date)], levels=month.abb)
),
dfTest=dfMultiCity[-idxTrain, ] %>%
mutate(fct_city=factor(cityName),
fct_mon=factor(month.abb[month(date)], levels=month.abb)
),
yVar="fct_mon",
xVars=dfMultiCity %>% select(where(is.numeric)) %>% names,
isContVar=FALSE,
returnData=TRUE
)
## Growing trees.. Progress: 33%. Estimated remaining time: 1 minute, 1 seconds.
## Growing trees.. Progress: 71%. Estimated remaining time: 25 seconds.
##
## Accuracy of test data is: 49.76%
The random forest is run including factors as predictors:
set.seed(25070713)
idxTrain <- sample(1:nrow(dfMultiCity), round(0.75*nrow(dfMultiCity)), replace=FALSE)
rfTestMonth_v001 <- runFullRF(dfTrain=dfMultiCity[idxTrain, ] %>%
mutate(fct_city=factor(cityName),
fct_mon=factor(month.abb[month(date)], levels=month.abb)
),
dfTest=dfMultiCity[-idxTrain, ] %>%
mutate(fct_city=factor(cityName),
fct_mon=factor(month.abb[month(date)], levels=month.abb)
),
yVar="fct_mon",
xVars=dfMultiCity %>%
select(where(is.numeric), where(is.factor), -winddirection_10m_dominant) %>%
names %>%
c("fct_city"),
isContVar=FALSE,
returnData=TRUE
)
## Growing trees.. Progress: 48%. Estimated remaining time: 33 seconds.
## Growing trees.. Progress: 90%. Estimated remaining time: 6 seconds.
##
## Accuracy of test data is: 56.749%
A random forest is run for predicting city based solely on numeric predictors:
set.seed(25070813)
idxTrain <- sample(1:nrow(dfMultiCity), round(0.75*nrow(dfMultiCity)), replace=FALSE)
rfTestCity_v001 <- runFullRF(dfTrain=dfMultiCity[idxTrain, ] %>%
mutate(fct_city=factor(cityName),
fct_mon=factor(month.abb[month(date)], levels=month.abb)
),
dfTest=dfMultiCity[-idxTrain, ] %>%
mutate(fct_city=factor(cityName),
fct_mon=factor(month.abb[month(date)], levels=month.abb)
),
yVar="fct_city",
xVars=dfMultiCity %>% select(where(is.numeric)) %>% names,
isContVar=FALSE,
returnData=TRUE
)
## Growing trees.. Progress: 47%. Estimated remaining time: 34 seconds.
## Growing trees.. Progress: 96%. Estimated remaining time: 2 seconds.
##
## Accuracy of test data is: 71.821%
The random forest is run including factors as predictors:
set.seed(25070912)
idxTrain <- sample(1:nrow(dfMultiCity), round(0.75*nrow(dfMultiCity)), replace=FALSE)
rfTestCity_v001 <- runFullRF(dfTrain=dfMultiCity[idxTrain, ] %>%
mutate(fct_city=factor(cityName),
fct_mon=factor(month.abb[month(date)], levels=month.abb)
),
dfTest=dfMultiCity[-idxTrain, ] %>%
mutate(fct_city=factor(cityName),
fct_mon=factor(month.abb[month(date)], levels=month.abb)
),
yVar="fct_city",
xVars=dfMultiCity %>%
select(where(is.numeric), where(is.factor), -winddirection_10m_dominant) %>%
names %>%
c("fct_mon"),
isContVar=FALSE,
returnData=TRUE
)
## Growing trees.. Progress: 56%. Estimated remaining time: 24 seconds.
##
## Accuracy of test data is: 78.015%
A random forest is run for predicting maximum temperature based solely on numeric predictors:
set.seed(25071012)
idxTrain <- sample(1:nrow(dfMultiCity), round(0.75*nrow(dfMultiCity)), replace=FALSE)
rfTestMaxT_v001 <- runFullRF(dfTrain=dfMultiCity[idxTrain, ] %>%
mutate(fct_city=factor(cityName),
fct_mon=factor(month.abb[month(date)], levels=month.abb)
),
dfTest=dfMultiCity[-idxTrain, ] %>%
mutate(fct_city=factor(cityName),
fct_mon=factor(month.abb[month(date)], levels=month.abb)
),
yVar="temperature_2m_max",
xVars=dfMultiCity %>%
select(where(is.numeric), -winddirection_10m_dominant, -temperature_2m_max) %>%
names,
isContVar=TRUE,
rndTo=-1L,
returnData=TRUE
)
## Growing trees.. Progress: 33%. Estimated remaining time: 1 minute, 2 seconds.
## Growing trees.. Progress: 73%. Estimated remaining time: 23 seconds.
##
## R-squared of test data is: 99.558% (RMSE 0.71 vs. 10.71 null)
## `geom_smooth()` using formula = 'y ~ x'
The random forest is run including factors as predictors:
set.seed(25071111)
idxTrain <- sample(1:nrow(dfMultiCity), round(0.75*nrow(dfMultiCity)), replace=FALSE)
rfTestMaxT_v001 <- runFullRF(dfTrain=dfMultiCity[idxTrain, ] %>%
mutate(fct_city=factor(cityName),
fct_mon=factor(month.abb[month(date)], levels=month.abb)
),
dfTest=dfMultiCity[-idxTrain, ] %>%
mutate(fct_city=factor(cityName),
fct_mon=factor(month.abb[month(date)], levels=month.abb)
),
yVar="temperature_2m_max",
xVars=dfMultiCity %>%
select(where(is.numeric),
where(is.factor),
-winddirection_10m_dominant,
-temperature_2m_max
) %>%
names,
isContVar=TRUE,
rndTo=-1L,
returnData=TRUE
)
## Growing trees.. Progress: 39%. Estimated remaining time: 48 seconds.
## Growing trees.. Progress: 71%. Estimated remaining time: 25 seconds.
## Growing trees.. Progress: 98%. Estimated remaining time: 1 seconds.
##
## R-squared of test data is: 99.541% (RMSE 0.72 vs. 10.67 null)
## `geom_smooth()` using formula = 'y ~ x'
The random forest is run excluding “temperature” variables:
set.seed(25071213)
idxTrain <- sample(1:nrow(dfMultiCity), round(0.75*nrow(dfMultiCity)), replace=FALSE)
rfTestMaxT_v001 <- runFullRF(dfTrain=dfMultiCity[idxTrain, ] %>%
mutate(fct_city=factor(cityName),
fct_mon=factor(month.abb[month(date)], levels=month.abb)
),
dfTest=dfMultiCity[-idxTrain, ] %>%
mutate(fct_city=factor(cityName),
fct_mon=factor(month.abb[month(date)], levels=month.abb)
),
yVar="temperature_2m_max",
xVars=dfMultiCity %>%
select(where(is.numeric),
where(is.factor),
-winddirection_10m_dominant,
-temperature_2m_max,
-contains("temp")
) %>%
names,
isContVar=TRUE,
rndTo=-1L,
returnData=TRUE
)
## Growing trees.. Progress: 43%. Estimated remaining time: 40 seconds.
## Growing trees.. Progress: 89%. Estimated remaining time: 7 seconds.
##
## R-squared of test data is: 91.942% (RMSE 3.04 vs. 10.71 null)
## `geom_smooth()` using formula = 'y ~ x'
The random forest is run excluding “temperature” variables but including city and month:
set.seed(25071312)
idxTrain <- sample(1:nrow(dfMultiCity), round(0.75*nrow(dfMultiCity)), replace=FALSE)
rfTestMaxT_v001 <- runFullRF(dfTrain=dfMultiCity[idxTrain, ] %>%
mutate(fct_city=factor(cityName),
fct_mon=factor(month.abb[month(date)], levels=month.abb)
),
dfTest=dfMultiCity[-idxTrain, ] %>%
mutate(fct_city=factor(cityName),
fct_mon=factor(month.abb[month(date)], levels=month.abb)
),
yVar="temperature_2m_max",
xVars=dfMultiCity %>%
select(where(is.numeric),
where(is.factor),
-winddirection_10m_dominant,
-temperature_2m_max,
-contains("temp")
) %>%
names %>%
c("fct_city", "fct_mon"),
isContVar=TRUE,
rndTo=-1L,
returnData=TRUE
)
## Growing trees.. Progress: 36%. Estimated remaining time: 54 seconds.
## Growing trees.. Progress: 69%. Estimated remaining time: 27 seconds.
##
## R-squared of test data is: 94.725% (RMSE 2.44 vs. 10.62 null)
## `geom_smooth()` using formula = 'y ~ x'
K-means clustering is run on the numeric variables, with two clusters to find the first split:
kmTest_v001 <- runKMeans(df=dfMultiCity %>%
mutate(fct_city=factor(cityName),
fct_mon=factor(month.abb[month(date)], levels=month.abb)
) %>%
mutate(across(where(is.numeric), .fns=scale)),
vars=dfMultiCity %>% select(where(is.numeric), -winddirection_10m_dominant) %>% names,
centers=2,
nStart=5L,
seed=25071413,
plotMeans=TRUE,
plotPct=list(c("fct_mon", "fct_city")),
returnKM=TRUE
)
Given the variables provided (many related to temperature), the first split is primarily cold season vs. warm season.
K-means is run on the numeric variables, with three clusters to find the second split:
kmTest_v002 <- runKMeans(df=dfMultiCity %>%
mutate(fct_city=factor(cityName),
fct_mon=factor(month.abb[month(date)], levels=month.abb)
) %>%
mutate(across(where(is.numeric), .fns=scale)),
vars=dfMultiCity %>% select(where(is.numeric), -winddirection_10m_dominant) %>% names,
centers=3,
nStart=5L,
seed=25071512,
plotMeans=TRUE,
plotPct=list(c("fct_mon", "fct_city")),
returnKM=TRUE
)
The split continues to be primarily cold vs. warm season, with a third segment for days with precipitation and/or above average winds
K-means is run on the numeric variables, with four clusters to find the third split:
kmTest_v003 <- runKMeans(df=dfMultiCity %>%
mutate(fct_city=factor(cityName),
fct_mon=factor(month.abb[month(date)], levels=month.abb)
) %>%
mutate(across(where(is.numeric), .fns=scale)),
vars=dfMultiCity %>% select(where(is.numeric), -winddirection_10m_dominant) %>% names,
centers=4,
nStart=5L,
nrowMeans=1L,
seed=25071613,
plotMeans=TRUE,
plotPct=list(c("fct_mon", "fct_city")),
returnKM=TRUE
)
The splits appear to be 1) cold/moderate/warm, and 2) precipitation
K-means is run on the numeric variables, with five clusters to find the fourth split:
kmTest_v004 <- runKMeans(df=dfMultiCity %>%
mutate(fct_city=factor(cityName),
fct_mon=factor(month.abb[month(date)], levels=month.abb)
) %>%
mutate(across(where(is.numeric), .fns=scale)),
vars=dfMultiCity %>% select(where(is.numeric), -winddirection_10m_dominant) %>% names,
centers=5,
nStart=10L,
iter.max=25L,
nrowMeans=1L,
seed=25071715,
plotMeans=TRUE,
plotPct=list(c("fct_mon", "fct_city")),
returnKM=TRUE
)
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 5844000)
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 5844000)
The fourth split appears to differentiate snowy vs. non-snowy days during the cold season
K-means is run on the numeric variables, with six clusters to find the fifth split:
kmTest_v005 <- runKMeans(df=dfMultiCity %>%
mutate(fct_city=factor(cityName),
fct_mon=factor(month.abb[month(date)], levels=month.abb)
) %>%
mutate(across(where(is.numeric), .fns=scale)),
vars=dfMultiCity %>% select(where(is.numeric), -winddirection_10m_dominant) %>% names,
centers=6,
nStart=10L,
iter.max=25L,
nrowMeans=1L,
seed=25071715,
plotMeans=TRUE,
plotPct=list(c("fct_mon", "fct_city")),
returnKM=TRUE
)
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 5844000)
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 5844000)
The fifth split appears to separate particularly windy days
PCA is run on the numeric variables:
pcaTest_v001 <- prcomp(dfMultiCity %>%
mutate(fct_city=factor(cityName),
fct_mon=factor(month.abb[month(date)], levels=month.abb)
) %>%
mutate(across(where(is.numeric), .fns=scale)) %>%
select(where(is.numeric), -winddirection_10m_dominant) %>%
as.matrix()
)
tibble::tibble(sdev=c(0, pcaTest_v001$sdev),
n=0:(length(sdev)-1),
csvar=cumsum(sdev**2)/sum(sdev**2)
) %>%
ggplot(aes(x=n, y=csvar)) +
geom_col(fill="lightblue") +
geom_text(aes(label=paste0(round(100*csvar, 1), "%"), vjust=ifelse(n>0, 1, 0))) +
labs(title="% Variance Explained vs. # Components", x="# Components", y="Cumulative Variance Explained")
Given variables provided (many related to temperature), the first five components capture ~95% of the variance
Loadings for the first three components are explored:
pcaTest_v001$rotation %>%
as.data.frame() %>%
rownames_to_column("vrbl") %>%
tibble::as_tibble() %>%
pivot_longer(-c(vrbl)) %>%
ggplot(data=filter(., name %in% c("PC1", "PC2", "PC3")), mapping=aes(x=vrbl, y=value)) +
geom_point(aes(color=name)) +
coord_flip() +
facet_wrap(~name) +
geom_hline(yintercept=0, lty=2) +
scale_color_discrete(NULL) +
labs(title="Loadings for first three components", x=NULL, y=NULL)
The first component loads heaviest on temperature, while the second components loads heavier on rain and the third component loads heavier on wind speed.
Correlations among the numeric variables are assessed:
dfMultiCity %>%
mutate(fct_city=factor(cityName),
fct_mon=factor(month.abb[month(date)], levels=month.abb)
) %>%
mutate(across(where(is.numeric), .fns=scale)) %>%
select(where(is.numeric), -winddirection_10m_dominant) %>%
makeHeatMap()
High correlations include 1) temperature, 2) radiation/et0, 3) wind speed/gusts, and 4) precipitation/rain. Of potential interest, snow is the only variable that appears to have few high-magnitude correlations
Loadings for the fourth component are explored:
pcaTest_v001$rotation %>%
as.data.frame() %>%
rownames_to_column("vrbl") %>%
tibble::as_tibble() %>%
pivot_longer(-c(vrbl)) %>%
ggplot(data=filter(., name %in% c("PC1", "PC2", "PC3", "PC4")), mapping=aes(x=vrbl, y=value)) +
geom_point(aes(color=name)) +
coord_flip() +
facet_wrap(~name, nrow=1) +
geom_hline(yintercept=0, lty=2) +
scale_color_discrete(NULL) +
labs(title="Loadings for first four components", x=NULL, y=NULL)
The fourth component loads very heavily on snowfall, so the first four components largely capture the correlation groupings in the heatmap. Previous analysis showed these four components explain ~90% of the variance
Data for an additional city (previously downloaded) are processed:
dfDailyMSP <- omRunAllSteps(dlData=FALSE,
runStats=TRUE,
abbCity="msp",
useNameCity="Minneapolis MN",
returnDF=TRUE
) %>%
mutate(cityName="Minneapolis MN")
##
## Objects in JSON include: latitude, longitude, generationtime_ms, utc_offset_seconds, timezone, timezone_abbreviation, elevation, daily_units, daily
##
## Rows: 23,376
## Columns: 21
## $ date <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode <fct> 73, 73, 3, 71, 3, 3, 73, 71, 71, 3, 51, 51,…
## $ temperature_2m_max <dbl> -2.3, -1.6, -8.3, -13.4, -14.6, -2.8, 1.6, …
## $ temperature_2m_min <dbl> -5.2, -7.1, -21.4, -27.0, -28.0, -12.4, -3.…
## $ apparent_temperature_max <dbl> -7.7, -4.2, -14.4, -19.6, -21.7, -7.5, -4.1…
## $ apparent_temperature_min <dbl> -10.2, -13.2, -27.3, -32.8, -33.9, -19.2, -…
## $ precipitation_sum <dbl> 6.2, 6.2, 0.0, 0.3, 0.0, 0.0, 1.5, 0.5, 0.3…
## $ rain_sum <dbl> 0.0, 0.1, 0.0, 0.0, 0.0, 0.0, 0.1, 0.0, 0.0…
## $ snowfall_sum <dbl> 4.34, 4.27, 0.00, 0.21, 0.00, 0.00, 0.98, 0…
## $ precipitation_hours <dbl> 12, 15, 0, 3, 0, 0, 9, 3, 2, 0, 5, 4, 1, 0,…
## $ sunrise <dttm> 1960-01-01 08:51:00, 1960-01-02 08:51:00, …
## $ sunset <dttm> 1960-01-01 17:41:00, 1960-01-02 17:42:00, …
## $ windspeed_10m_max <dbl> 18.7, 19.9, 19.5, 20.2, 25.8, 21.4, 25.0, 2…
## $ windgusts_10m_max <dbl> 37.1, 40.7, 39.2, 40.7, 49.7, 40.0, 47.5, 4…
## $ winddirection_10m_dominant <int> 126, 322, 297, 244, 240, 245, 261, 299, 160…
## $ shortwave_radiation_sum <dbl> 2.19, 3.71, 6.54, 6.30, 7.99, 5.99, 6.70, 7…
## $ et0_fao_evapotranspiration <dbl> 0.39, 0.39, 0.34, 0.26, 0.26, 0.42, 0.62, 0…
## $ sunrise_chr <chr> "1960-01-01T08:51", "1960-01-02T08:51", "19…
## $ sunset_chr <chr> "1960-01-01T17:41", "1960-01-02T17:42", "19…
## $ fct_winddir <fct> 126, 322, 297, 244, 240, 245, 261, 299, 160…
##
## ACF peaks
## [1] 368 726
##
## ACF troughs
## [1] 184 551 915
##
## ACF peaks
## [1] 69 98 124 162 188 221 238 271 299 312 356 374 435 450 461 479 493 537 553
## [20] 577 604 638 653 723 741 756 792 813 837 872 927
##
## ACF troughs
## [1] 56 104 126 160 186 203 230 263 296 320 349 366 378 437 457 482 499 518 540
## [20] 575 623 634 647 666 710 737 748 783 795 807 825 843 878 914 955 989
##
## ACF peaks
## [1] 17 140 164 193 228 241 257 272 303 338 350 361 386 397 412 438 460 484 499
## [20] 531 557 569 601 632 654 684 702 718 740 765 788 834 872 920 934 951 982
##
## ACF troughs
## [1] 13 35 53 65 113 145 166 186 207 230 247 316 343 359 375 390 403 431 451
## [20] 504 515 541 552 564 588 607 628 658 686 704 721 732 753 774 795 815 857 869
## [39] 909 941 967
dfDailyMSP
## # A tibble: 23,376 × 22
## date time weathercode temperature_2m_max temperature_2m_min
## <date> <chr> <fct> <dbl> <dbl>
## 1 1960-01-01 1960-01-01 73 -2.3 -5.2
## 2 1960-01-02 1960-01-02 73 -1.6 -7.1
## 3 1960-01-03 1960-01-03 3 -8.3 -21.4
## 4 1960-01-04 1960-01-04 71 -13.4 -27
## 5 1960-01-05 1960-01-05 3 -14.6 -28
## 6 1960-01-06 1960-01-06 3 -2.8 -12.4
## 7 1960-01-07 1960-01-07 73 1.6 -3.8
## 8 1960-01-08 1960-01-08 71 -3.9 -17.9
## 9 1960-01-09 1960-01-09 71 -0.1 -13
## 10 1960-01-10 1960-01-10 3 -3.5 -11.3
## # ℹ 23,366 more rows
## # ℹ 17 more variables: apparent_temperature_max <dbl>,
## # apparent_temperature_min <dbl>, precipitation_sum <dbl>, rain_sum <dbl>,
## # snowfall_sum <dbl>, precipitation_hours <dbl>, sunrise <dttm>,
## # sunset <dttm>, windspeed_10m_max <dbl>, windgusts_10m_max <dbl>,
## # winddirection_10m_dominant <int>, shortwave_radiation_sum <dbl>,
## # et0_fao_evapotranspiration <dbl>, sunrise_chr <chr>, sunset_chr <chr>, …
An existing random forest is applied for predicting Minneapolis:
runFullRF(dfTrain=NULL,
dfTest=dfDailyMSP %>%
mutate(fct_city=factor(cityName,
levels=dfMultiCity %>% pull(cityName) %>% unique()
),
fct_mon=factor(month.abb[month(date)], levels=month.abb)
),
useExistingRF=rfTestCity_v001$rf,
yVar="fct_city",
xVars=dfMultiCity %>%
select(where(is.numeric), where(is.factor), -winddirection_10m_dominant) %>%
names %>%
c("fct_mon"),
isContVar=FALSE,
returnData=FALSE
)
##
## Accuracy of test data is: NA%
An existing random forest is applied for predicting temperature:
runFullRF(dfTrain=NULL,
dfTest=dfDailyMSP %>%
mutate(fct_city=factor(cityName,
levels=dfMultiCity %>% pull(cityName) %>% unique()
),
fct_mon=factor(month.abb[month(date)], levels=month.abb)
),
useExistingRF=rfTestMaxT_v001$rf,
yVar="temperature_2m_max",
isContVar=TRUE,
rndTo=-1L,
returnData=FALSE
)
##
## R-squared of test data is: 80.247% (RMSE 5.78 vs. 13.01 null)
## `geom_smooth()` using formula = 'y ~ x'
The approach is modified by assuming the “most related” city for the new data (Detroit, MI):
runFullRF(dfTrain=NULL,
dfTest=dfDailyMSP %>%
mutate(fct_city=factor("Detroit MI",
levels=dfMultiCity %>% pull(cityName) %>% unique()
),
fct_mon=factor(month.abb[month(date)], levels=month.abb)
),
useExistingRF=rfTestMaxT_v001$rf,
yVar="temperature_2m_max",
isContVar=TRUE,
rndTo=-1L,
returnData=FALSE
)
##
## R-squared of test data is: 93.771% (RMSE 3.25 vs. 13.01 null)
## `geom_smooth()` using formula = 'y ~ x'
The random forest is re-run including the new data, with numerics and factors as predictors:
omCityMap_v2 <- c("atl"="Atlanta GA",
"ord"="Chicago IL",
"dtw"="Detroit MI",
"msp"="Minneapolis MN",
"msy"="New Orleans LA",
"den"="Denver CO"
)
dfMultiCity_v2 <- map_dfr(.x=names(omCityMap_v2),
.f=function(x) omRunAllSteps(dlData=FALSE, runStats=FALSE, abbCity=x, returnDF=TRUE),
.id="src"
) %>%
mutate(cityName=unname(omCityMap_v2)[as.integer(src)])
##
## Objects in JSON include: latitude, longitude, generationtime_ms, utc_offset_seconds, timezone, timezone_abbreviation, elevation, daily_units, daily
##
## Rows: 23,376
## Columns: 21
## $ date <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode <fct> 71, 61, 63, 53, 51, 53, 61, 2, 3, 3, 3, 3, …
## $ temperature_2m_max <dbl> 2.6, 11.4, 14.7, 7.3, 7.1, 7.3, 7.4, 10.7, …
## $ temperature_2m_min <dbl> 0.1, 1.5, 2.0, -0.1, 0.2, 4.5, 3.1, 0.4, -0…
## $ apparent_temperature_max <dbl> -2.3, 9.2, 13.1, 2.8, 4.4, 6.1, 4.9, 6.9, 1…
## $ apparent_temperature_min <dbl> -4.0, -3.1, -2.7, -4.7, -3.3, 1.4, -0.5, -4…
## $ precipitation_sum <dbl> 3.0, 7.8, 7.3, 1.1, 0.6, 5.8, 8.7, 0.0, 0.0…
## $ rain_sum <dbl> 2.5, 7.8, 7.3, 1.1, 0.6, 5.8, 8.7, 0.0, 0.0…
## $ snowfall_sum <dbl> 0.35, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
## $ precipitation_hours <dbl> 13, 21, 5, 3, 3, 17, 8, 0, 0, 0, 0, 0, 0, 5…
## $ sunrise <dttm> 1960-01-01 07:42:00, 1960-01-02 07:42:00, …
## $ sunset <dttm> 1960-01-01 17:39:00, 1960-01-02 17:40:00, …
## $ windspeed_10m_max <dbl> 20.9, 18.7, 23.8, 15.5, 9.2, 11.4, 19.2, 17…
## $ windgusts_10m_max <dbl> 39.6, 41.4, 56.2, 44.6, 33.1, 41.8, 39.2, 3…
## $ winddirection_10m_dominant <int> 90, 106, 295, 318, 2, 69, 288, 313, 140, 24…
## $ shortwave_radiation_sum <dbl> 2.38, 1.83, 10.18, 9.37, 3.88, 1.77, 5.34, …
## $ et0_fao_evapotranspiration <dbl> 0.57, 0.38, 1.34, 1.36, 0.65, 0.33, 0.76, 1…
## $ sunrise_chr <chr> "1960-01-01T07:42", "1960-01-02T07:42", "19…
## $ sunset_chr <chr> "1960-01-01T17:39", "1960-01-02T17:40", "19…
## $ fct_winddir <fct> 90, 106, 295, 318, 2, 69, 288, 313, 140, 24…
##
## Objects in JSON include: latitude, longitude, generationtime_ms, utc_offset_seconds, timezone, timezone_abbreviation, elevation, daily_units, daily
##
## Rows: 23,376
## Columns: 21
## $ date <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode <fct> 3, 51, 3, 2, 3, 1, 3, 3, 51, 51, 61, 63, 53…
## $ temperature_2m_max <dbl> 1.6, 4.9, -0.6, -5.9, -6.1, 1.0, 4.9, 1.6, …
## $ temperature_2m_min <dbl> -3.5, -0.9, -7.5, -11.8, -11.2, -10.3, -3.2…
## $ apparent_temperature_max <dbl> -4.3, -0.6, -6.9, -13.1, -13.6, -4.4, -1.3,…
## $ apparent_temperature_min <dbl> -7.9, -7.0, -14.4, -18.4, -17.7, -17.9, -8.…
## $ precipitation_sum <dbl> 0.0, 1.7, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.6…
## $ rain_sum <dbl> 0.0, 1.7, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.6…
## $ snowfall_sum <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
## $ precipitation_hours <dbl> 0, 11, 0, 0, 0, 0, 0, 0, 6, 1, 5, 24, 5, 10…
## $ sunrise <dttm> 1960-01-01 07:18:00, 1960-01-02 07:18:00, …
## $ sunset <dttm> 1960-01-01 16:29:00, 1960-01-02 16:30:00, …
## $ windspeed_10m_max <dbl> 22.7, 27.6, 27.0, 29.1, 29.6, 32.1, 32.7, 3…
## $ windgusts_10m_max <dbl> 40.0, 52.6, 44.6, 50.8, 48.2, 52.6, 56.5, 5…
## $ winddirection_10m_dominant <int> 142, 214, 268, 247, 261, 232, 234, 275, 185…
## $ shortwave_radiation_sum <dbl> 7.45, 2.25, 4.58, 8.66, 9.09, 8.79, 5.86, 8…
## $ et0_fao_evapotranspiration <dbl> 0.95, 0.66, 1.06, 1.06, 1.04, 1.33, 1.23, 1…
## $ sunrise_chr <chr> "1960-01-01T07:18", "1960-01-02T07:18", "19…
## $ sunset_chr <chr> "1960-01-01T16:29", "1960-01-02T16:30", "19…
## $ fct_winddir <fct> 142, 214, 268, 247, 261, 232, 234, 275, 185…
##
## Objects in JSON include: latitude, longitude, generationtime_ms, utc_offset_seconds, timezone, timezone_abbreviation, elevation, daily_units, daily
##
## Rows: 23,376
## Columns: 21
## $ date <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode <fct> 3, 55, 51, 3, 3, 3, 3, 51, 51, 51, 3, 63, 6…
## $ temperature_2m_max <dbl> 0.4, 3.2, 2.7, -3.3, -6.0, -0.8, 4.1, 2.5, …
## $ temperature_2m_min <dbl> -4.4, -2.5, -2.9, -8.4, -9.7, -10.7, -4.9, …
## $ apparent_temperature_max <dbl> -4.4, -1.1, -1.8, -9.2, -13.1, -7.0, -3.1, …
## $ apparent_temperature_min <dbl> -8.1, -7.4, -8.9, -14.9, -15.2, -18.0, -9.6…
## $ precipitation_sum <dbl> 0.0, 7.6, 0.4, 0.0, 0.0, 0.0, 0.0, 0.4, 0.3…
## $ rain_sum <dbl> 0.0, 7.6, 0.4, 0.0, 0.0, 0.0, 0.0, 0.4, 0.3…
## $ snowfall_sum <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
## $ precipitation_hours <dbl> 0, 17, 4, 0, 0, 0, 0, 4, 3, 1, 0, 17, 6, 4,…
## $ sunrise <dttm> 1960-01-01 08:01:00, 1960-01-02 08:01:00, …
## $ sunset <dttm> 1960-01-01 17:10:00, 1960-01-02 17:10:00, …
## $ windspeed_10m_max <dbl> 15.0, 21.5, 25.6, 23.0, 28.7, 30.3, 30.9, 3…
## $ windgusts_10m_max <dbl> 30.2, 42.5, 48.2, 45.0, 54.4, 54.4, 57.6, 6…
## $ winddirection_10m_dominant <int> 109, 196, 261, 247, 257, 239, 223, 272, 161…
## $ shortwave_radiation_sum <dbl> 6.09, 0.93, 4.91, 4.91, 8.07, 6.47, 4.41, 6…
## $ et0_fao_evapotranspiration <dbl> 0.55, 0.33, 0.81, 0.96, 1.04, 1.07, 1.01, 1…
## $ sunrise_chr <chr> "1960-01-01T08:01", "1960-01-02T08:01", "19…
## $ sunset_chr <chr> "1960-01-01T17:10", "1960-01-02T17:10", "19…
## $ fct_winddir <fct> 109, 196, 261, 247, 257, 239, 223, 272, 161…
##
## Objects in JSON include: latitude, longitude, generationtime_ms, utc_offset_seconds, timezone, timezone_abbreviation, elevation, daily_units, daily
##
## Rows: 23,376
## Columns: 21
## $ date <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode <fct> 73, 73, 3, 71, 3, 3, 73, 71, 71, 3, 51, 51,…
## $ temperature_2m_max <dbl> -2.3, -1.6, -8.3, -13.4, -14.6, -2.8, 1.6, …
## $ temperature_2m_min <dbl> -5.2, -7.1, -21.4, -27.0, -28.0, -12.4, -3.…
## $ apparent_temperature_max <dbl> -7.7, -4.2, -14.4, -19.6, -21.7, -7.5, -4.1…
## $ apparent_temperature_min <dbl> -10.2, -13.2, -27.3, -32.8, -33.9, -19.2, -…
## $ precipitation_sum <dbl> 6.2, 6.2, 0.0, 0.3, 0.0, 0.0, 1.5, 0.5, 0.3…
## $ rain_sum <dbl> 0.0, 0.1, 0.0, 0.0, 0.0, 0.0, 0.1, 0.0, 0.0…
## $ snowfall_sum <dbl> 4.34, 4.27, 0.00, 0.21, 0.00, 0.00, 0.98, 0…
## $ precipitation_hours <dbl> 12, 15, 0, 3, 0, 0, 9, 3, 2, 0, 5, 4, 1, 0,…
## $ sunrise <dttm> 1960-01-01 08:51:00, 1960-01-02 08:51:00, …
## $ sunset <dttm> 1960-01-01 17:41:00, 1960-01-02 17:42:00, …
## $ windspeed_10m_max <dbl> 18.7, 19.9, 19.5, 20.2, 25.8, 21.4, 25.0, 2…
## $ windgusts_10m_max <dbl> 37.1, 40.7, 39.2, 40.7, 49.7, 40.0, 47.5, 4…
## $ winddirection_10m_dominant <int> 126, 322, 297, 244, 240, 245, 261, 299, 160…
## $ shortwave_radiation_sum <dbl> 2.19, 3.71, 6.54, 6.30, 7.99, 5.99, 6.70, 7…
## $ et0_fao_evapotranspiration <dbl> 0.39, 0.39, 0.34, 0.26, 0.26, 0.42, 0.62, 0…
## $ sunrise_chr <chr> "1960-01-01T08:51", "1960-01-02T08:51", "19…
## $ sunset_chr <chr> "1960-01-01T17:41", "1960-01-02T17:42", "19…
## $ fct_winddir <fct> 126, 322, 297, 244, 240, 245, 261, 299, 160…
##
## Objects in JSON include: latitude, longitude, generationtime_ms, utc_offset_seconds, timezone, timezone_abbreviation, elevation, daily_units, daily
##
## Rows: 23,376
## Columns: 21
## $ date <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode <fct> 61, 63, 51, 3, 51, 63, 51, 3, 3, 51, 3, 3, …
## $ temperature_2m_max <dbl> 15.5, 19.1, 17.2, 12.1, 17.9, 18.2, 10.3, 1…
## $ temperature_2m_min <dbl> 10.9, 12.3, 8.6, 6.4, 11.3, 10.3, 6.1, 4.6,…
## $ apparent_temperature_max <dbl> 13.6, 20.8, 15.8, 9.2, 20.1, 20.2, 7.7, 9.1…
## $ apparent_temperature_min <dbl> 6.6, 9.8, 4.1, 2.7, 8.0, 7.3, 2.5, 1.7, 4.8…
## $ precipitation_sum <dbl> 7.1, 18.5, 0.3, 0.0, 0.7, 26.0, 0.1, 0.0, 0…
## $ rain_sum <dbl> 7.1, 18.5, 0.3, 0.0, 0.7, 26.0, 0.1, 0.0, 0…
## $ snowfall_sum <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ precipitation_hours <dbl> 12, 12, 2, 0, 7, 19, 1, 0, 0, 1, 0, 0, 0, 8…
## $ sunrise <dttm> 1960-01-01 07:55:00, 1960-01-02 07:55:00, …
## $ sunset <dttm> 1960-01-01 18:10:00, 1960-01-02 18:11:00, …
## $ windspeed_10m_max <dbl> 28.9, 18.8, 29.8, 16.2, 21.0, 19.7, 22.7, 1…
## $ windgusts_10m_max <dbl> 53.6, 36.4, 50.0, 27.7, 34.6, 59.4, 39.2, 2…
## $ winddirection_10m_dominant <int> 69, 125, 345, 40, 81, 352, 324, 66, 87, 106…
## $ shortwave_radiation_sum <dbl> 7.86, 4.91, 13.14, 13.63, 4.33, 1.05, 10.67…
## $ et0_fao_evapotranspiration <dbl> 1.05, 0.79, 2.22, 2.17, 1.14, 0.28, 1.30, 1…
## $ sunrise_chr <chr> "1960-01-01T07:55", "1960-01-02T07:55", "19…
## $ sunset_chr <chr> "1960-01-01T18:10", "1960-01-02T18:11", "19…
## $ fct_winddir <fct> 69, 125, 345, 40, 81, 352, 324, 66, 87, 106…
##
## Objects in JSON include: latitude, longitude, generationtime_ms, utc_offset_seconds, timezone, timezone_abbreviation, elevation, daily_units, daily
##
## Rows: 23,376
## Columns: 21
## $ date <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode <fct> 75, 3, 1, 3, 1, 3, 1, 3, 3, 3, 3, 51, 73, 7…
## $ temperature_2m_max <dbl> -4.5, -5.7, -7.4, -1.9, -1.5, 3.4, 5.4, 9.6…
## $ temperature_2m_min <dbl> -11.0, -19.2, -22.5, -22.8, -20.8, -16.0, -…
## $ apparent_temperature_max <dbl> -8.9, -10.0, -11.9, -6.2, -5.5, -0.3, 0.5, …
## $ apparent_temperature_min <dbl> -15.8, -24.5, -28.3, -28.8, -26.4, -21.4, -…
## $ precipitation_sum <dbl> 8.4, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0…
## $ rain_sum <dbl> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0…
## $ snowfall_sum <dbl> 5.88, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
## $ precipitation_hours <dbl> 13, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 3, 17,…
## $ sunrise <dttm> 1960-01-01 08:20:00, 1960-01-02 08:20:00, …
## $ sunset <dttm> 1960-01-01 17:45:00, 1960-01-02 17:46:00, …
## $ windspeed_10m_max <dbl> 22.1, 9.8, 13.3, 14.5, 14.3, 13.4, 16.1, 17…
## $ windgusts_10m_max <dbl> 36.0, 15.8, 21.2, 23.4, 21.2, 22.3, 25.9, 2…
## $ winddirection_10m_dominant <int> 15, 156, 154, 149, 175, 196, 166, 182, 188,…
## $ shortwave_radiation_sum <dbl> 6.21, 10.29, 10.29, 11.02, 10.95, 10.69, 10…
## $ et0_fao_evapotranspiration <dbl> 0.51, 0.73, 0.71, 0.89, 0.90, 1.17, 1.26, 1…
## $ sunrise_chr <chr> "1960-01-01T08:20", "1960-01-02T08:20", "19…
## $ sunset_chr <chr> "1960-01-01T17:45", "1960-01-02T17:46", "19…
## $ fct_winddir <fct> 15, 156, 154, 149, 175, 196, 166, 182, 188,…
dfMultiCity_v2 %>%
count(cityName)
## # A tibble: 6 × 2
## cityName n
## <chr> <int>
## 1 Atlanta GA 23376
## 2 Chicago IL 23376
## 3 Denver CO 23376
## 4 Detroit MI 23376
## 5 Minneapolis MN 23376
## 6 New Orleans LA 23376
set.seed(25072718)
idxTrain_v2 <- sample(1:nrow(dfMultiCity_v2), round(0.75*nrow(dfMultiCity_v2)), replace=FALSE)
rfTestCity_v002 <- runFullRF(dfTrain=dfMultiCity_v2[idxTrain_v2, ] %>%
mutate(fct_city=factor(cityName),
fct_mon=factor(month.abb[month(date)], levels=month.abb)
),
dfTest=dfMultiCity_v2[-idxTrain_v2, ] %>%
mutate(fct_city=factor(cityName),
fct_mon=factor(month.abb[month(date)], levels=month.abb)
),
yVar="fct_city",
xVars=dfMultiCity_v2 %>%
select(where(is.numeric), where(is.factor), -winddirection_10m_dominant) %>%
names %>%
c("fct_mon"),
isContVar=FALSE,
returnData=TRUE
)
## Growing trees.. Progress: 30%. Estimated remaining time: 1 minute, 11 seconds.
## Growing trees.. Progress: 59%. Estimated remaining time: 43 seconds.
## Growing trees.. Progress: 87%. Estimated remaining time: 13 seconds.
##
## Accuracy of test data is: 70.699%
The process is updated to focus only on the most difficult cities to separate (Detroit, Minnesota, Chicago):
runFullRF(dfTrain=dfMultiCity_v2[idxTrain_v2, ] %>%
mutate(fct_city=factor(cityName),
fct_mon=factor(month.abb[month(date)], levels=month.abb)
) %>%
filter(cityName %in% c("Chicago IL", "Detroit MI", "Minneapolis MN")),
dfTest=dfMultiCity_v2[-idxTrain_v2, ] %>%
mutate(fct_city=factor(cityName),
fct_mon=factor(month.abb[month(date)], levels=month.abb)
) %>%
filter(cityName %in% c("Chicago IL", "Detroit MI", "Minneapolis MN")),
yVar="fct_city",
xVars=dfMultiCity_v2 %>%
select(where(is.numeric), where(is.factor), -winddirection_10m_dominant) %>%
names %>%
c("fct_mon"),
isContVar=FALSE,
returnData=FALSE
)
## Warning: Dropped unused factor level(s) in dependent variable: Atlanta GA,
## Denver CO, New Orleans LA.
## Growing trees.. Progress: 64%. Estimated remaining time: 17 seconds.
##
## Accuracy of test data is: 63.19%
The random forest is run excluding “temperature” variables but including city and month:
set.seed(25072914)
idxTrain_v2 <- sample(1:nrow(dfMultiCity_v2), round(0.75*nrow(dfMultiCity_v2)), replace=FALSE)
rfTestMaxT_v002 <- runFullRF(dfTrain=dfMultiCity_v2[idxTrain_v2, ] %>%
mutate(fct_city=factor(cityName),
fct_mon=factor(month.abb[month(date)], levels=month.abb)
),
dfTest=dfMultiCity_v2[-idxTrain_v2, ] %>%
mutate(fct_city=factor(cityName),
fct_mon=factor(month.abb[month(date)], levels=month.abb)
),
yVar="temperature_2m_max",
xVars=dfMultiCity_v2 %>%
select(where(is.numeric),
where(is.factor),
-winddirection_10m_dominant,
-temperature_2m_max,
-contains("temp")
) %>%
names %>%
c("fct_city", "fct_mon"),
isContVar=TRUE,
rndTo=-1L,
returnData=TRUE
)
## Growing trees.. Progress: 27%. Estimated remaining time: 1 minute, 22 seconds.
## Growing trees.. Progress: 58%. Estimated remaining time: 44 seconds.
## Growing trees.. Progress: 90%. Estimated remaining time: 9 seconds.
##
## R-squared of test data is: 95.155% (RMSE 2.49 vs. 11.3 null)
## `geom_smooth()` using formula = 'y ~ x'
Data for an additional city (previously downloaded) are processed:
dfDailyPHX <- omRunAllSteps(dlData=FALSE,
runStats=TRUE,
abbCity="phx",
useNameCity="Phoenix AZ",
returnDF=TRUE
) %>%
mutate(cityName="Phoenix AZ")
##
## Objects in JSON include: latitude, longitude, generationtime_ms, utc_offset_seconds, timezone, timezone_abbreviation, elevation, daily_units, daily
##
## Rows: 23,376
## Columns: 21
## $ date <date> 1960-01-01, 1960-01-02, 1960-01-03, 1960-0…
## $ time <chr> "1960-01-01", "1960-01-02", "1960-01-03", "…
## $ weathercode <fct> 51, 1, 0, 1, 0, 0, 0, 0, 3, 51, 51, 61, 61,…
## $ temperature_2m_max <dbl> 7.9, 7.4, 9.4, 10.0, 9.4, 9.5, 13.0, 17.0, …
## $ temperature_2m_min <dbl> 0.3, -3.6, -4.0, -3.0, -3.5, -3.4, -1.9, -0…
## $ apparent_temperature_max <dbl> 4.9, 4.1, 6.4, 6.4, 6.3, 6.5, 10.8, 15.2, 1…
## $ apparent_temperature_min <dbl> -3.1, -7.8, -8.2, -7.8, -8.0, -7.7, -5.6, -…
## $ precipitation_sum <dbl> 1.8, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0…
## $ rain_sum <dbl> 1.8, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0…
## $ snowfall_sum <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
## $ precipitation_hours <dbl> 10, 0, 0, 0, 0, 0, 0, 0, 0, 1, 14, 17, 18, …
## $ sunrise <dttm> 1960-01-01 07:32:00, 1960-01-02 07:32:00, …
## $ sunset <dttm> 1960-01-01 17:30:00, 1960-01-02 17:31:00, …
## $ windspeed_10m_max <dbl> 10.0, 9.9, 10.7, 12.3, 12.4, 10.9, 8.0, 8.4…
## $ windgusts_10m_max <dbl> 29.9, 17.3, 16.9, 24.8, 19.8, 17.3, 14.4, 1…
## $ winddirection_10m_dominant <int> 206, 43, 19, 17, 23, 49, 41, 68, 75, 117, 1…
## $ shortwave_radiation_sum <dbl> 11.08, 11.07, 13.10, 12.98, 13.19, 13.13, 1…
## $ et0_fao_evapotranspiration <dbl> 1.30, 1.26, 1.59, 1.69, 1.65, 1.56, 1.63, 1…
## $ sunrise_chr <chr> "1960-01-01T07:32", "1960-01-02T07:32", "19…
## $ sunset_chr <chr> "1960-01-01T17:30", "1960-01-02T17:31", "19…
## $ fct_winddir <fct> 206, 43, 19, 17, 23, 49, 41, 68, 75, 117, 1…
##
## ACF peaks
## [1] 368 731
##
## ACF troughs
## [1] 184 546 913
##
## ACF peaks
## [1] 70 163 194 217 357 481 553 577 600 688 716 744 776 895 923 935 947 974
##
## ACF troughs
## [1] 26 166 182 197 240 253 365 400 489 521 542 556 585 722 768 890 917 932
##
## ACF peaks
## [1] 20 40 82 105 138 154 171 192 219 230 252 271 294 324 350 368 387 398 425
## [20] 466 478 500 516 542 560 597 620 632 653 686 730 750 761 777 809 845 866 886
## [39] 899 920 941 964
##
## ACF troughs
## [1] 27 48 73 91 119 152 186 205 229 247 267 283 300 318 329 340 365 380 406
## [20] 420 445 486 505 547 566 584 624 645 663 674 700 715 742 754 768 803 819 842
## [39] 856 872 890 909 929 972
dfDailyPHX
## # A tibble: 23,376 × 22
## date time weathercode temperature_2m_max temperature_2m_min
## <date> <chr> <fct> <dbl> <dbl>
## 1 1960-01-01 1960-01-01 51 7.9 0.3
## 2 1960-01-02 1960-01-02 1 7.4 -3.6
## 3 1960-01-03 1960-01-03 0 9.4 -4
## 4 1960-01-04 1960-01-04 1 10 -3
## 5 1960-01-05 1960-01-05 0 9.4 -3.5
## 6 1960-01-06 1960-01-06 0 9.5 -3.4
## 7 1960-01-07 1960-01-07 0 13 -1.9
## 8 1960-01-08 1960-01-08 0 17 -0.4
## 9 1960-01-09 1960-01-09 3 17.4 1.7
## 10 1960-01-10 1960-01-10 51 13.8 5.9
## # ℹ 23,366 more rows
## # ℹ 17 more variables: apparent_temperature_max <dbl>,
## # apparent_temperature_min <dbl>, precipitation_sum <dbl>, rain_sum <dbl>,
## # snowfall_sum <dbl>, precipitation_hours <dbl>, sunrise <dttm>,
## # sunset <dttm>, windspeed_10m_max <dbl>, windgusts_10m_max <dbl>,
## # winddirection_10m_dominant <int>, shortwave_radiation_sum <dbl>,
## # et0_fao_evapotranspiration <dbl>, sunrise_chr <chr>, sunset_chr <chr>, …
An existing random forest is applied for predicting Phoenix:
runFullRF(dfTrain=NULL,
dfTest=dfDailyPHX %>%
mutate(fct_city=factor(cityName,
levels=dfMultiCity_v2 %>% pull(cityName) %>% unique()
),
fct_mon=factor(month.abb[month(date)], levels=month.abb)
),
useExistingRF=rfTestCity_v002$rf,
yVar="fct_city",
xVars=dfMultiCity_v2 %>%
select(where(is.numeric), where(is.factor), -winddirection_10m_dominant) %>%
names %>%
c("fct_mon"),
isContVar=FALSE,
returnData=FALSE
)
##
## Accuracy of test data is: NA%